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MSH File : KALLIKREIN 

TITLE : Novel Human Kallikrein-Like Genes 

FIELD OF THE INVENTION 

The invention relates to nucleic acid molecules, proteins encoded by such nucleic acid 
5 molecules; and use of the proteins and nucleic acid molecules*- 
BACKGROUND OF THE INVENTION 

Kallikreins and kallikrein-like proteins are a subgroup of the serine proteasevenzyme 
family and exhibit a high degree of substrate specificity (1). The biological role of these 
kallikreins is the selective cleavage of specific polypeptide precursors (substrates) to release 
10 peptides with potent biological activity (2). In mouse and rat, kallikreins are encoded by large 
multigene families. In the mouse genome, at least 24 genes have been identified (3). Expression 
£atx of 1 1 of these genes has been confirmed; the rest are presumed to be pseudogenes (4). A similar 

Q family of 15-20 kallikreins has been found in the rat genome (5) where at least 4 of these are 

& 

ly known to be expressed (6). 

*si 

i /j 15 Three human kallikrein genes have been described, i.e. prostatic specific antigen (PSA 

!5| or KLK3) (7), human glandular kallikrein (KLK2) (8) and tissue (pancreatic-renal) kallikrein 

in 

«_ (KLKl) (9). SThe PSA gene spans 5.8 Kbof sequence whiehuhas been«published<7); the KLK2 

gene has a size of 5.2 Kb and its complete structure has also been^elucidated (8); The KLKl 
gene is approximately 4.5 Kb long and the*exon sequences and the exonfintron junctions of this 
*S 20 gene have been determined (9). - 

The mouse kallikrein genes are clustered in groups of up to 1 1 genes on chromosome 
7 and the distance between the genes in the various clusters can be as small as 3-7 Kb (3). All 
three human kallikrein genes have been assigned to chromosome 19ql3.2 — 19ql3.4 and the 
distance between PSA and KLK2 has been estimated to be 12 Kb (9). 
25 A major difference between mouse and human kallikreins is that two of the human 

kallikreins (KLK2 and KLK3) are expressed almost exclusively in the prostate while in animals 
none of the kallikreins is localized in this organ. Other candidate new members of the human 
kallikrein gene family include protease M (10) (also named Zyme (11) or neurosin (12) and the 
normal epithelial cell-specific gene-1 (NES1) (13). Both genes have- been assigned to 
3 0 chromosome 19ql»3.3 (10*14) and show structural homology with<othePserine proteases and the 
kallikreimgene family^(i0-14):^ 
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SUMMARY OF THE INVENTION 

In efforts to precisely define the relative genomic location of PSA, KLK2, Zyme and 
NES1 genes, an area spanning approximately 300 Kb of contiguous sequence on human 
chromosome 19 (19ql3.3 -ql3.4) was examined. The present inventors were able to identify 
the relative location of the known kallikrein genes and, in addition, they identified other 
kallikrein- like genes which exhibit both location proximity and structural similarity with the 
known members of the human kallikrein family. The novel genes exhibit homology with the 
currently known members of the kallikrein family and they are co-localized in the same genomic 
region. These new genes, like the already known kaUikreins have utility in various cancers 
including those of the breast, testicular, and prostate. 

The kallikrein-like proteins described herein are individually referred to as "KLK-L1 to 
KLK-L5", and collectively as 4< kallikrein-like proteins" or "KLK-L Proteins". The" genes 
encoding the proteins are referred to as kik-ll to klk-15 or kallikrein-like genes or "klk-l genes". 

Broadly stated the present invention relates to an isolated nucleic acid molecule which 
comprises: 

(i) a nucleic acid sequence encoding a protein having substantial sequence identity 
preferably at least 60% sequence identity, with an amino acid sequence of KLK- 
Ll to KLK-L5 as shown in Tables 2 to 6; 

(ii) a nucleic acid sequence encoding a protein comprising with an amino acid 
sequence of KLK-L1 to KLK-L5 as shown in Tables 2 to 6; 

(iii) nucleic acid sequences complementary to (i); 

(i v) a degenerate form of a nucleic acid sequence of (i); 

(v) a nucleic acid sequence capable of hybridizing under stringent conditions to a 
nucleic acid sequence in (i), (ii) or (iii); 

(vi) a nucleic acid sequence encoding a truncation, an analog, an allelic or species 
variation of a protein comprising with an amino acid sequence of KLK-L1 to 
KLK-L5 as shown in Tables 2 to 6; or 

(vii) a fragment, or allelic or species variation of (i), (ii) or (iii). 

Preferably, a purified and isolated nucleic acid molecule of the invention comprises: 
(i) a nucleic acid sequence comprising the sequence of Figure 2, 3, 4, 5, or 6 wherein T can 
also be U; 



(ii) nucleic acid sequences complementary to (i), preferably complementary to the full 
nucleic acid sequence of Figure 2, 3, 4, 5, or 6; 

(iii) a nucleic acid capable of hybridizing under stringent conditions to a nucleic acid of (i) 
or (ii) and preferably having at least 18 nucleotides; or 

(iv) a nucleic acid molecule differing from any of the nucleic acids of*(i) to (iii) in codon * 
sequences due to the degeneracy of the genetic code. 

The invention also -contemplates a nucleic acid molecule comprising a sequence 
encoding a truncation of a KLK-L protein, an analog, or a homolog of a KLK-L Protein or a 
truncation thereof. (KLK-L Protein and truncations, analogs and homologs of the KLK-L 
Protein are also collectively referred to herein as "KLK-L Related Proteins"). 

The nucleic acid molecules of the invention may be inserted into an appropriate 
expression vector, i.e. a vector that contains the necessary elements for the transcription an3 " 
translation of the inserted coding sequence. Accordingly, recombinant expression vectors 
adapted for transformation of a host cell may be constructed which comprise a nucleic acid 
molecule of the invention and one or more transcription and translation elements linked to the 
nucleic acid molecule. 

Tfre recombinant expression vector^can be used to prepare transformed host cells 
expressing KLK-L Related Proteins. Therefore, .the invention further provides host cells 
containing a recombinant molecule of the invention: The invention also contemplates transgenic 
non-human mammals whose germ cells and somatic- cells contain a recombinant molecule 
comprising^ nucleic*acid ? moIecule of the invention, in particular one which encodes an analog 
of the KLK-L Protein, or a truncation of the KLK-L Protein. 

Hie invention further provides a method for preparing KLK-L Related Proteins utilizing 
the purified and isolated nucleic acid molecules of the invention. In an embodiment a method 
for preparing a KLK-L Related Protein is provided comprising (a) transferring a recombinant 
expression vector of the invention into a host cell; (b) selecting transformed host cells from 
untransformed host cells; (c) culturing a selected transformed host cell under conditions which 
allow expression of the KLK-L Related Protein; and (d) isolating the KLK-L Related Protein. 

The invention further broadly contemplates an isolated KLK-L Protein* comprising an 
amino acid sequence as shown in Tables 2 to 6. 

The KLK-L Related Proteins of the invention may.be conjugated^withvdther«molecules, 
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such as proteins, to prepare fusion proteins. This may be accomplished, for example, by the 
synthesis of N-tenminal or C-terminal fusion proteins. 

The invention further contemplates antibodies having specificity against an epitope of 
a KLK-L Related Protein of the invention. Antibodies may be labeled with a detectable 
substance and used to detect proteins of the invention in tissues and cells. 

The invention also permits the construction of nucleotide probes which are unique to the 
nucleic acid molecules of the invention and/or to proteins of the invention. Therefore, the 
invention also relates to a probe comprising a nucleic acid sequence of the invention, or a 
nucleic acid sequence encoding a protein of the invention, or a part thereof. The probe may be 
labeled, for example, with a detectable substance and it may be used to select from a mixture 
of nucleotide sequences a nucleic acid molecule of the invention including nucleic acid 
molecules coding for a protein which displays one or more of the properties of a protein of Ihe 
invention. 

The invention still further provides a method for identifying a substance which binds to 
a protein of the invention comprising reacting the protein with at least one substance which 
potentially can bind with the protein, under conditions which permit the formation of complexes 
between the substance and protein and assaying for complexes, for free substance, or for non- 
complexed protein. The invention also contemplates methods for identifying substances that 
bind to other intracellular proteins that interact with a KLK-L Related Protein. Methods can also 
be utilized which identify compounds which bind to KLK-L gene regulatory sequences (e.g. 
promoter sequences). 

Still further the invention provides a method for evaluating a compound for its ability 
to modulate the biological activity of a KLK-L Related Protein of the invention. For example 
a substance which inhibits or enhances the interaction of the protein and a substance which 
binds to the protein may be evaluated. In an embodiment, the method comprises providing a 
known concentration of a KLK-L Related Protein, with a substance which binds to the protein 
and a test compound under conditions which permit the formation of complexes between the 
substance and protein, and removing and/or detecting complexes. 

Compounds which modulate the biological activity of a protein of the invention may 
also be identified using the methods of the invention by comparing the pattern and level of 
expression of the protein of the invention in tissues and cells, in the presence, and in the absence 



of the compounds. 

The substances and compounds identified using the methods of the invention, and 
peptides of the invention may be used to modulate the biological activity of a KLK-L Related 
Protein of the invention, and they may be used in the treatment of conditions such as cancer (e.g. 
breast, testicular, and prostate cancer). Accordingly, the substances and compounds may be 
formulated into compositions for administration^ individuals suffering from cancer. 

Therefore, the present invention* also relates to a composition comprising one or more 
of a protein of the invention, a peptide of the invention, or a substance or compound identified 
using the methods of the invention, and a phaimaceutically acceptable carrier, excipient or 
diluent. A method for treating or preventing cancer is also provided comprising administering 
to a patient in need thereof, a KLK-L Related Protein of the invention, or a composition of the 
invention. 

Hie present inventors have also identified a novel gene homologous to myelin 
associated protein designated UG. Therefore the invention provides an isolated nucleic acid 
molecule which comprises: 

(i) a nucleic acid sequence encoding a protein having substantial sequence identity 
preferably at least 60% sequence identity, with an amino acid sequence as shown 
in Table -7; 

(ii) a nucleic acid sequence encoding a protein comprising with an amino acid 
sequence of as shown in Tabled; 

(iii) nucleic acid sequences complementary to (i); 

(iv) a degenerate form of a nucleic acid sequence of (i); 

(v) a nucleic acid sequence capable of hybridizing undo- stringent conditions to a 
nucleic acid sequence in (i), (ii) or (iii); 

(vi) a nucleic acid sequence encoding a truncation, an analog, an allelic or species 
variation of a protein comprising with an amino acid sequence of as shown in 
Table 7; or 

(vii) a fragment, or allelic or species variation of (i), (ii) or (iii). 

The invention further contemplates an isolated UG Protein comprising an amino acid 
sequence as shown in Table 7. 

The general description herein relating to the-klk-I nucleic acid*molecules^and^KLK-L 
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Proteins and KLK-L Related Proteins, antibodies, methods, and compositions are applicable to 
the novel UG protein and nucleic acid molecule. 

Other objects, features and advantages of the present invention will become apparent 
from the following detailed description. It should be understood, however, that the detailed 
description and the specific examples while indicating preferred embodiments of the invention 
are given by way of illustration only, since various changes and modifications within the spirit 
and scope of the invention will become apparent to those skilled in the art from this detailed 
description. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention will now be described in relation to the drawings in which: 

Figure 1: An approximate 300 Kb of contiguous genomic sequence around 
chromosome 19ql3.3 - q!3.4 represented by 8 contigs, each one shown with its length in Kb. 
The contig numbers refer to those reported in the Lawrence Iivermore National Laboratory 
website. Note the localization of the seven known genes (PSA, KLK2, Zyme, NES1, HSCCE, 
neuropsin and TLSP) (see abbreviations for full names of these genes). All genes are 
represented with arrows denoting the direction of transcription. The gene with no homology to 
human kallikreins is termed UG (unknown gene). The five new kallikrein-like genes (KLK-L1 
to KLK-L5) were numbered from the most centromeric to the most telomeric. Numbers just 
below or just above the arrows indicate appropriate Kb lengths in each contig. The length of 
each of these genes may change in the future since not all exons were identified for each new 
gene, as shown in Tables 2-7. 

Figure 2 shows the nucleic acid sequence of KLK-L1; 

Figure 3 shows the nucleic acid sequence of KLK-L2; 

Figure 4 shows the nucleic acid sequence of KLK-L3; 

Figure 5 shows the nucleic acid sequence of KLK-L4; and 

Figure 6 shows the nucleic acid sequence of KLK-L5. 
DETAILED DESCRIPTION OF THE INVENTION 

In accordance with the present invention there may be employed conventional molecular 
biology, microbiology, and recombinant DNA techniques within the skill of the art. Such 
techniques are explained fully in the literature. See for example, Sambrook, Fritsch, & Maniatis, 
Molecular Cloning: A Laboratory Manual, Second Edition (1989) Cold Spring Harbor 



Laboratory Press, Cold Spring Harbor, N. Y); DNA Cloning: A Practical Approach, Volumes 
I and H (D.N. Glover ed. 1985); Oligonucleotide Synthesis (M.J. Gait ed. 1984); Nucleic Acid 
Hybridization B.D. Hames & SJ. Higgins eds. (1985); Transcription and Translation B.D. 
Hames & SJ. Higgins eds (1984); Animal Cell Culture RJ. Freshney, ed. (1986); Immobilized 
Cells and enzymes-IRL Press, (1986); and B. Peibal, A Practical Guide to Molecular Cloning 
(1984). 

1. Nucleic Acid Molecules of the Invention 

As hereinbefore mentioned, the invention provides an isolated nucleic acid molecule 
having a sequence encoding a KLK-L Protein. The term -isolated" refers to a nucleic acid 
substantially free of cellular material or culture medium when produced by recombinant DNA 
techniques, or chemical reactants, or other chemicals when chemically synthesized. An 
"isolated" nucleic acid may also be free of sequences which naturally flank the nucleic aci<r(i!<C 
sequences located at the 5' and 3* ends of the nucleic acid molecule) from which the nucleic acid 
is derived. The term "nucleic acid" is intended to include DNA and RNA and can be either 
double stranded or single stranded. In an embodiment, a nucleic acid molecule encodes a KLK- 
L Protein comprising an amino acid sequence as shown in Tables 2 to 6, preferably a nucleic 
acid molecule comprising^ nucleic acid sequencer shown in Figure 2, 3 f 4 9 S 9 or 6. 

Hie invention includes nucleic acid sequences complementary to a nucleic acid encoding 
a KLK-L Protein comprising^ amino acid sequence as shown in Tables 2 to 6, preferably the 
nucleic acid sequences complementary to a fulLnucleicacid sequence shown in Figure 2, 3, 4, 
5, or 

The invention includes nucleic acid molecules having substantial sequence identity or 
homology to nucleic acid sequences of the invention or encoding proteins having substantial 
identity or similarity to the amino acid sequence shown in Tables 2 to 9. Preferably, the nucleic 
acids have substantial sequence identity for example at least 40% nucleic acid identity; more 
preferably 50% nucleic acid identity; and most preferably at least 60% to 80% sequence identity. 

"Identity" as known in the art and used herein, is a relationship between two or more 
amino acid sequences or two or more nucleic acid sequences, as determined by comparing the 
sequences. It also refers to the degree of sequence relatedness between amino acid or nucleic 
acid sequences, as the -case may be, as determined by the-match between^strings of such 
sequences. Identity and similarity are* well known tenns-to skilled*artisans*and-they can be 
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calculated by conventional methods (for example see Computational Molecular Biology, Lesk, 
A.M. ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome 
Projects, Smith, D.W. ed., Academic Press, New York, 1993; Computer Analysis of Sequence 
Data, Part I, Griffin, A.M. and Griffin, H.G. eds., Humana Press, New Jersey, 1994; Sequence 
Analysis in Molecular Biology, von Heinje, G. Acadmeic Press, 1987; and Sequence Analysis 
Primer, Gribskov, M. and Devereux, J. eds. M. Stockton Press, New York, 1991, Carillo, H. and 
Lipman, D., SIAM J. Applied Math. 48:1073, 1988). Methods which are designed to give the 
largest match between the sequences are generally preferred. Methods to determine identity and 
similarity are codified in publicly available computer programs including the GCG program 
package (Devereux J. et al., Nucleic Acids Research 12(1): 387, 1984); BLASTP, BLASTN, 
and FASTA (Atschul, S.F. et al. J. Molec. Biol. 215: 403-410, 1990). The BLAST X program 
is publicly available from NCBI and other sources (BLAST Manual, Altschul, S. et al.~f?C&I 
NLM NIH Bethesda, Md. 20894; Altschul, S. et al. J. Mol. Biol. 215: 403-410, 1990). 

Isolated nucleic acid molecules encoding a KLK-L Protein, and having a sequence which 
differs from a nucleic acid sequence of the invention due to degeneracy in the genetic code are 
also within the scope of the invention. Such nucleic acids encode functionally equivalent 
proteins (e.g., a KLK-L Protein) but differ in sequence from the sequence of a KLK-L Protein 
due to degeneracy in the genetic code. As one example, DNA sequence polymorphisms within 
the nucleotide sequence of a KLK-L Protein may result in silent mutations which do not affect 
the amino acid sequence. Variations in one or more nucleotides may exist among individuals 
within a population due to natural allelic variation. Any and all such nucleic acid variations are 
within the scope of the invention. DNA sequence polymorphisms may also occur which lead 
to changes in the amino acid sequence of a KLK-L Protein. These amino acid polymorphisms 
are also within the scope of the present invention. 

Another aspect of the invention provides a nucleic acid molecule which hybridizes under 
stringent conditions, preferably high stringency conditions to a nucleic acid molecule which 
comprises a sequence which encodes a KLK-L Protein having an amino acid sequence shown 
in Tables 2 to 6. Appropriate stringency conditions which promote DNA hybridization are 
known to those skilled in the art, or can be found in CuiTent Protocols in Molecular Biology, 
John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6. For example, 6.0 x sodium chloride/sodium 
citrate (SSC) at about 45°C, followed by a wash of 2.0 x SSC at 50°C may be employed. The 



stringency may be selected based on the conditions used in the wash step. By way of example, 
the salt concentration in the wash step can be selected from a high stringency of about 0.2 x SSC 
at 50°C. In addition, the temperature in the wash step can be at high stringency conditions, at 
about 65°C. 

It will be appreciated that the invention includes nucleic*acid molecules-encoding a 
KLK-L Related-Protein including truncations of a KLK-L Protein, and*analogs of a KLK-L 
Protein*as described herein; It will further be appreeiated^that variant forms of the nucleic acid 
molecules of the invention which arise by alternative splicing of an mRNA corresponding to a 
cDNA of the invention are encompassed by the invention. 

An isolated nucleic acid molecule of the invention which comprises DNA can be 
isolated by preparing a labelled nucleic acid probe based on all or part of a nucleic acid 
sequence of the invention. The labeled nucleic acid probe is used to screen an appropriate DNA 
library (e.g. a cDNA or genomic DNA library). For example, a cDNA library can be used to 
isolate a cDNA encoding a KLK-L Related Protein by screening the library with the labeled 
probe using standard techniques. Alternatively, a genomic DNA library-can be similarly 
screened to isolate a genomic clone encompassing a gene encoding a KLK-L Related Protein. 
Nucleic acids isolated by screening of a cDNA or genomic DNA library^ean be sequenced by 
standard techniques. ^ 

An isolated nucleic acid molecule of thfe invention which is DNA can<also be isolated 
by selectively amplifying a nucleic acid encodings a KLK-L Related* Protein using the 
polymerase»chain,reaction (PGR) methods and cDNA or genomic DNA. It is possible to design 
synthetic oligonucleotide primers from the nucleotide sequence of the invention for use in PCR. 
A nucleic acid can be amplified from cDNA or genomic DNA using these oligonucleotide 
primers and standard PCR amplification techniques. The nucleic acid so amplified can be 
cloned into an appropriate vector and characterized by DNA sequence analysis. cDNA may be 
prepared from mRNA, by isolating total cellular mRNA by a variety of techniques, for example, 
by using the guanidinium-thiocyanate extraction procedure of Oiirgwin et al„ Biochemistry, 18, 
5294-5299 (1979). cDNA is then synthesized from the mRNA using.reverse transcriptase (for 
example, Moloney MLV reverse transcriptase available from Gibco/BRL, Bethesda, MD, or 
AMV reverse transcriptase available from Seikagaku America; Inc n St. Petersburg, FL). _ 

An isolated^nucleie* acid molecule of the-invention whictais RNtffcean«be isolated by 
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cloning a cDNA encoding a KLK-L Related Protein into an appropriate vector which allows 
for transcription of the cDNA to produce an RNA molecule which encodes a KLK-L Related 
Protein. For example, a cDNA can be cloned downstream of a bacteriophage promoter, (e.g. a 
T7 promoter) in a vector, cDNA can be transcribed in vitro with T7 polymerase, and the 
resultant RNA can be isolated by conventional techniques. 

Nucleic acid molecules of the invention may be chemically synthesized using standard 
techniques. Methods of chemically synthesizing polydeoxynucleotides are known, including but 
not limited to solid-phase synthesis which, like peptide synthesis, has been fully automated in 
commercially available DNA synthesizers (See e.g., Itakura et aL U.S. Patent No. 4,598,049; 
Caruthers et al. U.S. Patent No. 4,458,066; and Itakura U.S. Patent Nos. 4,401,796 and 
4373,071). 

Determination of whether a particular nucleic acid molecule encodes a KLK-L Related 
Protein can be accomplished by expressing the cDNA in an appropriate host cell by standard 
techniques, and testing the expressed protein in the methods described herein. A cDNA 
encoding a KLK-L Related Protein can be sequenced by standard techniques, such as 
dideoxynucleotide chain termination or Maxam-Gilbert chemical sequencing, to determine the 
nucleic acid sequence and the predicted amino acid sequence of the encoded protein. 

The initiation codon and untranslated sequences of a KLK-L Related Protein may be 
determined using computer software designed for die purpose, such as PC/Gene (IntelliGcnctics 
Inc., Calif.). The intron-exon structure and the transcription regulatory sequences of a gene 
encoding a KLK-L Related Protein may be confirmed by using a nucleic acid molecule of the 
invention encoding a KLK-L Related Protein to probe a genomic DNA clone library. 
Regulatory elements can be identified using standard techniques. The function of the elements 
can be confirmed by using these elements to express a reporter gene such as the lacZ gene which 
is operatively linked to the elements. These constructs may be introduced into cultured cells 
using conventional procedures or into non-human transgenic animal models. In addition to 
identifying regulatory elements in DNA, such constructs may also be used to identify nuclear 
proteins interacting with the elements, using techniques known in the art. 

In a particular embodiment of the invention, the nucleic acid molecules isolated using 
the methods described herein are mutant KLK-L gene alleles. The mutant alleles may be 
isolated from individuals either known or proposed to have a genotype which contributes to the 



symptoms of cancer (e.g. breast, testicular, or prostate cancer). Mutant alleles and mutant allele 
products may be used in therapeutic and diagnostic methods described herein. For example, a 
cDNA of a mutant KLK-L gene may be isolated using PCR as described herein, and the DNA 
sequence of the mutant allele may be compared to the normal allele to ascertain the mutation(s) 
responsible for the loss or alteration of function of the mutant gene product.vA genomic library 
can also be constructed using DNA from an individual suspected of or known to carry a mutant 
allele, or a cDNA library can be constructed using RNA from tissue*known; or suspected to 
express the mutant allele. A nucleic acid encoding a normal KLK-L gene or any suitable 
fragment thereof, may then be labeled and used as a probe to identify the corresponding mutant 
allele in such libraries. Clones containing mutant sequences can be purified and subjected to 
sequence analysis. In addition, an expression library can be constructed using cDNA from RNA 
isolated from a tissue of an individual known or suspected to express a mutant KLK-L allele/ 
Gene products made by the putatively mutant tissue may be expressed and screened, for 
example using antibodies specific for a KLK-L Related Protein as described herein. Library 
clones identified using the antibodies can be purified and subjected to sequence analysis. 

The sequence of a nucleic acid molecule of the invention, or a fragment of the molecule, 
may be inverted relative to its normal presentation^fon transcription to produce an antisense 
nucleic acid molecule. *An*antisense nucleic acid molecule may be constructed using chemical 
synthesis and enzymatic-hgation reactions using procedures known in the. art.** 
2/ Proteins of the Invention 

An amino acid sequence of a KLK-L Protein comprises a sequence as shown in Tables 

2 to 6. 

In addition to proteins comprising an amino acid sequence as shown Tables 2 to 6 the 
proteins of the present invention include truncations of a KLK-L Protein, analogs of a KLK-L 
Protein, and proteins having sequence identity or similarity to a KLK-L Protein, and 
truncations thereof as described herein (i.e. KLK-L Related Proteins). Truncated proteins may 
comprise peptides of between 3 and 70 amino acid residues, ranging in size from a tripeptide 
to a 70 mer polypeptide. 

The truncated proteins may have an amino group (-NH2),.a hydrophobic group (for 
example; carbobenzoxyl, dansyl, or T-butyloxycarbonyl)?' an^acetyl^grqup,*; a 9- 
fluorenylmethoxy-carbonyl (PMOC) group, or a macromoleeule mciuding*but*not limited to 
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lipid-fatty acid conjugates, polyethylene glycol, or carbohydrates at the amino terminal end. The 
truncated proteins may have a carboxyl group, an amido group, a T-butyloxycarbonyl group, or 
a macromolecule including but not limited to lipid-fatty acid conjugates, polyethylene glycol, 
or carbohydrates at the carboxy terminal end. 

The proteins of the invention may also include analogs of a KLK-L Protein, and/or 
truncations thereof as described herein, which may include, but are not limited to a KLK-L 
Protein, containing one or more amino acid substitutions, insertions, and/or deletions. Amino 
acid substitutions may be of a conserved or non-conserved nature. Conserved amino acid 
substitutions involve replacing one or more amino acids of a KLK-L Protein amino acid 
sequence with amino acids of similar charge, size, and/or hydrophobicity characteristics. When 
only conserved substitutions are made the resulting analog is preferably functionally equivalent 
to a KLK-L Protein. Non-conserved substitutions involve replacing one or more amino acids 
of the KLK-L Protein amino acid sequence with one or more amino acids which possess 
dissimilar charge, size, and/or hydrophobicity characteristics. 

One or more amino acid insertions may be introduced into a KLK-L Protein. Amino acid 
insertions may consist of single amino acid residues or sequential amino acids ranging from 2 
to IS amino acids in length. 

Deletions may consist of the removal of one or more amino acids, or discrete portions 
from a KLK-L Protein sequence. The deleted amino acids may or may not be contiguous. The 
lower limit length of the resulting analog with a deletion mutation is about 10 amino acids, 
preferably 20 to 40 amino acids. 

The proteins of the invention include proteins with sequence identity or similarity to a 
KLK-L Protein and/or truncations thereof as described herein. Such KLK-L Proteins include 
proteins whose amino acid sequences are comprised of the amino acid sequences of KLK-L 
Protein regions from other species that hybridize under selected hybridization conditions (see 
discussion of stringent hybridization conditions herein) with a probe used to obtain a KLK-L 
Protein. These proteins will generally have the same regions which are characteristic of a KLK- 
L Protein. Preferably a protein will have substantial sequence identity for example, about 50% 
identity, preferably 70 to 80% identity, more preferably at least 90% to 95% identity, and most 
preferably 98% identity with an amino acid sequence shown in Tables 2 to 6. 

A percent amino acid sequence homology, similarity or identity is calculated as the 
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percentage of aligned amino acids that match the reference sequence using known methods as 
described herein. 

The invention also contemplates isoforms of the proteins of the invention. An isoform 
contains the same number and kinds of amino acids as a protein of the invention, but the 
isoform has a different molecular structure. Isoforms contemplated by the*present invention 
preferably have the same properties as a protein of the invention as described herein. 

The present invention also includes KLK-L Related Proteins conjugatedtwith a selected 
protein, or a marker protein (see below) to produce fusion proteins. Additionally, immunogenic 
portions of a KLK-L Protein and a KLK-L Protein Related Protein are within the scope of the 
invention. 

A KLK-L Related Protein of the invention may be prepared using recombinant DNA 
methods. Accordingly, the nucleic acid molecules of the present invention having a sequence 
which encodes a KLK-L Related Protein of the invention may be incorporated in a known 
manner into an appropriate expression vector which ensures good expression of the protein. 
Possible expression vectors include but are not limited to cosmids, plasmids, or modified 
viruses (e.g. replication defective retroviruses, adenoviruses and adeno-associated viruses), so 
long as the vector is compatible with the host cell used. „ 

The invention therefore contemplates a recombinant expression vector of the invention * 
containing a nucleic acid molecule of the invention-and the necessary regulatory sequences for 
the transcription and translation of the inserted protein-sequence. Suitable-regulatory sequences 
may be derived from a variety of sources, including bacterial, fungal, viral, mammalian, or 
insect genes (For example, see the regulatory sequences described in Goeddel, Gene Expression 
Technology: Methods in Enzymology 185, Academic Press, San Diego, CA (1990). Selection 
of appropriate regulatory sequences is dependent on the host cell chosen as discussed below, and 
may be readily accomplished by one of ordinary skill in the art. The necessary regulatory 
sequences may be supplied by the native KLK-L Protein and/or its flanking regions. 

The invention further provides a recombinant expression vector comprising a DNA 
nucleic acid molecule of the invention cloned into the expression vector in an antisense 
orientation. That is, the DNA molecule is linked to a regulatory sequence.in a manner which 
alldws for expression, by transcription of the DNA molecule, of an-RNA^ molecule-which is 
antisense to the nucleic acid sequence of a protein of the invention or a fragment thereof . 
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Regulatory sequences linked to the antisense nucleic acid can be chosen which direct the 
continuous expression of the antisense RNA molecule in a variety of cell types, for instance a 
viral promoter and/or enhancer, or regulatory sequences can be chosen which direct tissue or 
cell type specific expression of antisense RNA. 

The recombinant expression vectors of the invention may also contain a marker gene 
which facilitates the selection of host cells transformed or transfected with a recombinant 
molecule of the invention. Examples of marker genes are genes encoding a protein such as G418 
and hygromycin which confer resistance to certain drugs, (5-galactosidase, chloramphenicol 
acetyltransferase, firefly luciferase, or an immunoglobulin or portion thereof such as the Fc 
portion of an immunoglobulin preferably IgG. The markers can be introduced on a separate 
vector from the nucleic acid of interest. 

The recombinant expression vectors may also contain genes which encode a fusion 
moiety which provides increased expression of the recombinant protein; increased solubility of 
the recombinant protein; and aid in the purification of the target recombinant protein by acting 
as a ligand in affinity purification. For example, a proteolytic cleavage site may be added to the 
target recombinant protein to allow separation of the recombinant protein from the fusion 
moiety subsequent to purification of the fusion protein. Typical fusion expression vectors 
include pGEX (Amrad Corp., Melbourne, Australia), pMAL (New England Biolabs, Beverly, 
MA) and pRTTS (Pharmacia, Piscataway, NJ) which fuse glutathione S-transferase (GST), 
maltose E binding protein, or protein A, respectively, to the recombinant protein. 

The recombinant expression vectors may be introduced into host cells to produce a 
transformant host cell. "Transformant host cells" include host cells which have been transformed 
or transfected with a recombinant expression vector of the invention. The terms "transformed 
with", "transfected with", "transformation" and "transfection" encompass the introduction of a 
nucleic acid (e.g. a vector) into a cell by one of many standard techniques. Prokaryotic cells can 
be transformed with a nucleic acid by, for example, electroporation or calcium-chloride 
mediated transformation. A nucleic acid can be introduced into mammalian cells via 
conventional techniques such as calcium phosphate or calcium chloride co-precipitation, DEAR- 
dextran-mediated transfection, lipofectin, electroporation or microinjection. Suitable methods 
for transforming and transfecting host cells can be found in Sambrook et al. (Molecular Cloning: 
A Laboratory Manual, 2nd Edition, Cold Spring Harbor Laboratory press (1989)), and other 
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laboratory textbooks. 

Suitable host cells include a wide variety of prokaryotic and eukaryotic host cells. For 
example, the proteins of the invention may be expressed in bacterial cells such as E. coli, insect 
cells (using baculovirus), yeast cells or mammalian cells. Other suitable host cells can be found 
in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic PresSi San 
Diego, CA (1991). 

A host cell may also be chosen which modulates the expression of an inserted nucleic 
acid sequence, or modifies (e.g. glycosylation or phosphorylation) and processes (e.g. cleaves) 
the protein in a desired fashion. Host systems or cell lines may be selected which have specific 
and characteristic mechanisms for post-translational processing and modification of proteins. 
For example, eukaryotic host cells including CHO, VERO, BHK, HeLA, COS, MDCK, 293, 
3T3, and WI38 may be used. For long-term high-yield stable expression of the protein, ceirifnes 
and host systems which stably express the gene product may be engineered. 

Host cells and in particular cell lines produced using the methods described herein may 
be 'particularly useful in screening and evaluating*compounds thafemodulate'the activity of a 
KMC-L Related Protein. 

The proteins of the invention may also be expressed in non-human transgenic animals 
including but not limited to mice, rats, rabbits, guinea pigs^micro-pigs, goats, sheep, pigs, non- 
human primates (e.g. baboons, monkeys, and chimpanzees) [see-Hammer et al. (Nature 
315:680-683, 1985), Palmiter et al. (Science 222:809-814, 1983), Brinster-et-al. (Proc Natl. 
Acad. Sci.USA, 82:44384442, 1985), Palmiter and Brinster*(Ce01. 41:343-345, 1985) and U.S. 
Patent No. 4,736,866)]. Procedures known in the art may be used to introduce a nucleic acid 
molecule of the invention encoding a KLK-L Related Protein into animals to produce the 
founder lines of transgenic animals. Such procedures include pronuclear microinjection, 
retrovirus mediated gene transfer into germ lines, gene targeting in embryonic stem cells, 
electroporation of embryos, and sperm-mediated gene transfer. 

The present invention contemplates a transgenic animal that carries the KLK-L gene in 
all their cells, and animals which carry the transgene in some but not all their cells. The 
transgene may be integrated as a single transgene or in concatamers?»The~transgene»may be 
selectively introduced into and activated in specific cell types (See4br*example, Lasko et al, 
1992* Proc. Natl. Acad; Sci. USA 89: 6236): The transgene may* be integrated* intoohe 
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chromosomal site of the endogenous gene by gene targeting. The transgene may be selectively 
introduced into a particular cell type inactivating the endogenous gene in that cell type (See Gu 
et al Science 265: 103-106). 

The expression of a recombinant KLK-L Related Protein in a transgenic animal may be 
assayed using standard techniques. Initial screening may be conducted by Southern Blot 
analysis, or PCR methods to analyze whether the transgene has been integrated. The level of 
mRNA expression in the tissues of transgenic animals may also be assessed using techniques 
including Northern blot analysis of tissue samples, in situ hybridization, and RT-PCR. Tissue 
may also be evaluated immunocytochemically using antibodies against KLK-L Protein. 

Proteins of the invention may also be prepared by chemical synthesis using techniques 
well known in the chemistry of proteins such as solid phase synthesis (Menifield, 1964, J. Am. 
Chem. Assoc. 85:2149-2154) or synthesis in homogenous solution (Houbenweyin987I 
Methods of Organic Chemistry, ed. E. Wansch, Vol. 15 1 and n, Thieme, Stuttgart). 

N-terminal or C-terminal fusion proteins comprising a KLK-L Related Protein of the 
invention conjugated with other molecules, such as proteins, may be prepared by fusing, through 
recombinant techniques, the N-terminal or C-tenninal of a KLK-L Related Protein, and the 
sequence of a selected protein or marker protein with a desired biological function. The resultant 
fusion proteins contain KLK-L Protein fused to the selected protein or marker protein as 
described herein. Examples of proteins which may be used to prepare fusion proteins include 
immunoglobulins, glutathione-S-transferase (GST), hemagglutinin (HA), and truncated myc. 
3. Antibodies 

KLK-L Related Proteins of the invention can be used to prepare antibodies specific for 
the proteins. Antibodies can be prepared which bind a distinct epitope in an unconsented region 
of the protein. An unconsented region of the protein is one that does not have substantial 
sequence homology to other proteins. A region from a conserved region such as a well- 
characterized domain can also be used to prepare an antibody to a conserved region of a KLK-L 
Related Protein. Antibodies having specificity for a KLK-L Related Protein may also be raised 
from fusion proteins created by expressing fusion proteins in bacteria as described herein. 

The invention can employ intact monoclonal or polyclonal antibodies, and 
immunologically active fragments (e.g. a Fab, (Fabh fragment, or Fab expression library 
fragments and epitope-binding fragments thereof), an antibody heavy chain, and antibody light 
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chain, a genetically engineered single chain Fv molecule (Ladner et al, U.S. Pat. No. 4,946,778), 
or a chimeric antibody, for example, an antibody which contains the binding specificity of a 
murine antibody, but in which the remaining portions are of human origin. Antibodies including 
monoclonal and polyclonal antibodies, fragments and chimeras, may be prepared using methods 
known to those ski lied in the art. 

4. Applications of the Nucleic Acid Molecules; KLK-L Related Proteins, and 
Antibodies-of the Invention 

The nucleic acid molecules, KLK-L Related Proteins, and antibodies of the invention 
may be used in the prognostic and diagnostic evaluation of cancer (e.g. breast, testicular, and 
prostate cancer), and the identification of subjects with a predisposition to cancer (Section 4.1.1 
and 4.1.2). Methods for detecting nucleic acid molecules and KLK-L Related Proteins of the 
invention, can be used to monitor cancer by detecting KLK-L Related Proteins and nucleic ac!3 
molecules encoding KLK-L Related Proteins. It would also be apparent to one skilled in the art 
that the methods described herein may be used to study the developmental expression of KLK-L 
Related Proteins and, accordingly, will provide further insight into the role of KLK-L Related 
Proteins. The applications of the present invention also include methods,for the identification 
of compounds that, modulate the biological activity of KLK^L jot KLK-L^Related Proteins 
(Sefction 4.2). The compounds, antibodies etc. may-be usedtfbr the treatment of cancer (Section 
4.3). 

4.1 Diagnostic Methods 

A variety^of methods can be employed for the* diagnostic and prognostic evaluation of 
cancer (e.g. breast, testicular, and prostate cancer), and the identification of subjects with a 
predisposition to cancer. Such methods may, for example, utilize nucleic acid molecules of the 
invention, and fragments thereof , and antibodies directed against KLK-L Related Proteins, 
including peptide fragments. In particular, the nucleic acids and antibodies may be used, for 
example, for (1) the detection of the presence of KLK-L mutations, or the detection of either 
over- or under-expression of KLK-L mRNA relative to a non-disorder state or the qualitative 
or quantitative detection of alternatively spliced forms of KLK-L transcripts -which may 
correlate with certain conditions or susceptibility toward such conditions^and«(2) the detection 
of either an over-^or an under-abundanee of KLK-L Related*Prc>teins ielative*to»anon- disorder 
state or the presenee»of a modified (e,g iv less than.full4ength) KIJK-L -Prdteimwhich correlates 
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with a disorder state, or a progression toward a disorder state. 

The methods described herein may be performed by utilizing pre-packaged diagnostic 
kits comprising at least one specific KLK-L nucleic acid or antibody described herein, which 
may be conveniently used, e.g., in clinical settings, to screen and diagnose patients and to screen 
and identify those individuals exhibiting a predisposition to developing a disorder. 

Nucleic acid-based detection techniques are described, below, in Section 4.1.1. Peptide 
detection techniques are described, below, in Section 4.1.2. The samples that may be analyzed 
using the methods of the invention include those which are known or suspected to express KLK- 
L or contain KLK-L Related Proteins. The samples may be derived from a patient or a cell 
culture, and include but are not limited to biological fluids, tissue extracts, freshly harvested 
cells, and lysates of cells which have been incubated in cell cultures. 
4.1.1 Methods for Detecting Nucleic Acid Molecules of the Invention 

The nucleic acid molecules of the invention allow those skilled in the art to construct 
nucleotide probes for use in the detection of nucleic acid sequences of the invention in samples. 
Suitable probes include nucleic acid molecules based on nucleic acid sequences encoding at 
least 5 sequential amino acids from regions of the KLK-L Protein, preferably they comprise 
15 to 30 nucleotides. A nucleotide probe may be labeled with a detectable substance such as a 
radioactive label which provides for an adequate signal and has sufficient half-life such as 32 P, 
3 H, 14 C or the like. Other detectable substances which may be used include antigens that are 
recognized by a specific labeled antibody, fluorescent compounds, enzymes, antibodies specific 
for a labeled antigen, and luminescent compounds. An appropriate label may be selected having 
regard to the rate of hybridization and binding of the probe to the nucleotide to be detected and 
the amount of nucleotide available for hybridization. Labeled probes may be hybridized to 
nucleic acids on solid supports such as nitrocellulose filters or nylon membranes as generally 
described in Sambrook et al, 1989, Molecular Cloning, A Laboratory Manual (2nd e&). The 
nucleic acid probes may be used to detect genes, preferably in human cells, that encode KLK-L 
Related Proteins. The nucleotide probes may also be useful in the diagnosis of cancer; in 
monitoring the progression of cancer, or monitoring a therapeutic treatment. 

The probe may be used in hybridization techniques to detect genes that encode KLK-L 
Related Proteins. The technique generally involves contacting and incubating nucleic acids (e.g. 
recombinant DNA molecules, cloned genes) obtained from a sample from a patient or other 
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cellular source with a probe of the present invention under conditions favorable for the specific 
annealing of the probes to complementary sequences in the nucleic acids. After incubation, the 
non-annealed nucleic acids are removed, and the presence of nucleic acids that have hybridized 
to the probe if any are detected. 

The detection of nucleic acid molecules of the inventio^may involveithe amplification 
of specific gene sequences using an amplification method such as*PCR, followed by the analysis 
of the amplified molecules using techniques known to those skilled*in the-art?*Suitable.primers 
can be routinely designed by one of skill in the art. 

Genomic DNA may be used in hybridization or amplification assays of biological 
samples to detect abnormalities involving klk- 1 structure, including point mutations, insertions, 
deletions, and chromosomal rearrangements. For example, direct sequencing, single stranded 
conformational polymorphism analyses, heteroduplex analysis, denaturing gradient" "gel 
electrophoresis, chemical mismatch cleavage, and oligonucleotide hybridization may be utilized. 

Genotyping techniques known to one skilled in the ait can be used to type 
polymorphisms that are in close proximity to the-mutations in a klk-Lgene. Thepolymorphisms 
may be used to identify individuals in families -that are likely*»to carry mutations. If a 
polymorphismxxhibits linkag^disequalibrium witifmutationsin a £H:-/ S gene r it can also be used 
to screen for individuals in the general population likely to carry mutations. y Polymoiphisms 
which may be used include restriction fragment length polymorphisms (RELPs), single-base 
polymorphisms; and simple sequence repeat polymoiphisms*(SSlJPs)*- 

A probe.of the inventionvmay be used to directly identify RFLPs. A probe or primer of 
the invention can additionally be used to isolate genomic clones such as YACs, BACs, PACs, 
cosmids, phage or plasmids. The DNA in the clones can be screened for SSLPs using 
hybridization or sequencing procedures. 

Hybridization and amplification techniques described herein may be used to assay 
qualitative and quantitative aspects of klk-l expression. For example, RNA may be isolated from 
a cell type or tissue known to express klk-l and tested utilizing the hybridization (e.g. standard 
Northern analyses) or PCR techniques referred to herein. The techniques may be used to detect 
differences in transcript size which may be due to normal or abnormal alternative splicing. The 
techniques may be used to detect quantitative differences between4evels*of full length and/or 
alternatively splice transcripts detected in normal individuals relative^to-those^individuals 




exhibiting cancer symptoms or other disease conditions. 

The primers and probes may be used in the above described methods in situ i.e directly 
on tissue sections (fixed and/or frozen) of patient tissue obtained from biopsies or resections. 
4.1.2 Methods for Detecting KLK-L Related Proteins 

Antibodies specifically reactive with a KLK-L Related Protein, or derivatives, such as 
enzyme conjugates or labeled derivatives, may be used to detect KLK-L Related Proteins in 
various samples (e.g. biological materials). They may be used as diagnostic or prognostic 
reagents and they may be used to detect abnormalities in the level of KLK-L Related Proteins 
expression, or abnormalities in the structure, and/or temporal, tissue, cellular, or subcellular 
location of a KLK-L Related Protein. Antibodies may also be used to screen potentially 
therapeutic compounds in vitro to determine their effects on cancer, and other conditions. In 
vitro immunoassays may also be used to assess or monitor the efficacy of particular therapies? 
The antibodies of the invention may also be used in vitro to determine the level of KLK-L 
expression in cells genetically engineered to produce a KLK-L Related Protein. 

The antibodies may be used in any known immunoassays which rely on the binding 
interaction between an antigenic determinant of a KLK-L Related Protein and the antibodies. 
Examples of such assays are radioimmunoassays, enzyme immunoassays (e.g. ELISA), 
immunofluorescence, immunoprecipitation, latex agglutination, hemagglutination, and 
histochemical tests. The antibodies may be used to detect and quantify KLK-L Related Proteins 
in a sample in order to determine its role in particular cellular events or pathological states, and 
to diagnose and treat such pathological states. 

In particular, the antibodies of the invention may be used in immuno-histochemical 
analyses, for example, at the cellular and sub-subcellular level, to detect a KLK-L Related 
Protein, to localize it to particular cells and tissues, and to specific subcellular locations, and to 
quantitate the level of expression. 

Cytochemical techniques known in the art for localizing antigens using light and 
electron microscopy may be used to detect a KLK-L Related Protein. Generally, an antibody 
of the invention may be labeled with a detectable substance and a KLK-L Related Protein may 
be localised in tissues and cells based upon the presence of the detectable substance. Examples 
of detectable substances include, but are not limited to, the following: radioisotopes (e.g., 3 H, 
14 C, ^S, l25 I, l31 I), fluorescent labels (e.g., FTTC, rhodamine, Ianthanide phosphors), luminescent 
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labels such as luminol; enzymatic labels (e.g., horseradish peroxidase, beta-galactosidase, 
luciferase, alkaline phosphatase, acetylcholinesterase), biotinyl groups (which can be detected 
by marked avidin e.g., streptavidin containing a fluorescent marker or enzymatic activity that 
can be detected by optical or calorimetric methods), predetermined polypeptide epitopes 
recognized by a secondary reporter- (e.g., leucine zipper- pair sequences,^binding sites for 
secondary antibodies, metal binding domains, epitope tags). In some*embodiments, labels are 
attached via spacer arms of various lengths to reduce potential steric hindrance^Antibodies may 
also be coupled to electron dense substances, such as ferritin or colloidal gold, which are readily 
visualised by electron microscopy. 

The antibody or sample may be immobilized on a carrier or solid support which is 
capable of immobilizing cells, antibodies etc. For example, the carrier or support may be 
nitrocellulose, or glass, polyacrylamides, gabbros, and magnetite. The support materiafinay 
have any possible configuration including spherical (e.g. bead), cylindrical (e.g. inside surface 
of a test tube or well, or the external surface of a rod), or flat (e.g. sheet, test strip). Indirect 
methods may also be employed in which the*primary antigen-antibody reaction is amplified by 
the introduction of a second antibody, having specificity for the antibody reactive against KLK- 
L Related*Brotein. By. way of example ; 4f the antibody having^p^fiin.ty Agaijist a KLK-L 
Related Protein is a rabbit IgG antibody, the second antibody.may beigoat antirrabbit gamma- 
globulin labelediwiuVa detectable*substance as described'herein. 

Where a radioactive label is used as a-detectable substance, a KLK-L Related Protein 
may be localized by radioautography. The results of radioautography^may^be quantitated by 
determining the density of particles in the radioautographs by various optical methods, or by 
counting the grains. 

Methods for Identifying or Evaluating Substances/Compounds 
The methods described herein are designed to identify substances that modulate the 
biological activity of a KLK-L Related Protein including substances that bind to KLK-L 
Related Proteins, or bind to other proteins that interact with a KLK-L Related Protein, to 
compounds that interfere with, or enhance the interaction of a KLK-L Related Protein and 
substances that bind to the KLK-L Related Protein or other-proteins that interact with a KLK-L 
Related Protein. Methods are also utilized that identify compoundsKhatebind to KLK-L 
regulatory sequences. 
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The substances and compounds identified using the methods of the invention include 
but are not limited to peptides such as soluble peptides including Ig-tailed fusion peptides, 
members of random peptide libraries and combinatorial chemistry-derived molecular libraries 
made of D- and/or ^configuration amino acids, phosphopeptides (including members of 
random or partially degenerate, directed phosphopeptide libraries), antibodies [e.g. polyclonal, 
monoclonal, humanized, anti-idiotypic, chimeric, single chain antibodies, fragments, (e.g. Fab, 
F(ab>2, and Fab expression library fragments, and epitope-binding fragments thereof)], and small 
organic or inorganic molecules. The substance or compound may be an endogenous 
physiological compound or it may be a natural or synthetic compound. 

Substances which modulate a KLK-L Related Protein can be identified based on their 
ability to bind to a KLK-L Related Protein. Therefore, the invention also provides methods for 
identifying substances which bind to a KLK-L Related Protein. Substances identified using trie 
methods of the invention may be isolated, cloned and sequenced using conventional techniques. 

Substances which can bind with a KLK-L Related Protein may be identified by reacting 
a KLK-L Related Protein with a test substance which potentially binds to a KLK-L Related 
Protein, under conditions which permit the formation of substance-KLK-L Related Protein 
complexes and removing and/or detecting the complexes. The complexes can be detected by 
assaying for substance-KLK-L Related Protein complexes, for free substance, or for non- 
complexed KLK-L Related Protein. Conditions which permit the formation of substance-KLK- 
L Related Protein complexes may be selected having regard to factors such as the nature and 
amounts of the substance and the protein. 

The substance-protein complex, free substance or non-complexed proteins may be 
isolated by conventional isolation techniques, for example, salting out, chromatography, 
electrophoresis, gel filtration, fractionation, absorption, polyacrylamide gel electrophoresis, 
agglutination, or combinations thereof. To facilitate the assay of the components, antibody 
against KLK-L Related Protein or the substance, or labeled KLK-L Related Protein, or a 
labeled substance may be utilized. The antibodies, proteins, or substances may be labeled with 
a detectable substance as described above. 

A KLK-L Related Protein, or the substance used in the method of the invention may be 
insolubilized. For example, a KLK-L Related Protein, or substance may be bound to a suitable 
carrier such as agarose, cellulose, dextran, Sephadex, Sepharose, carboxymethyl cellulose 
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polystyrene, filter paper, ion-exchange resin, plastic film, plastic tube, glass beads, polyamine- 
niethyl vinyl-ether-maleic acid copolymer, amino acid copolymer, ethylene-maleic acid 
copolymer, nylon, silk, etc. The carrier may be in the shape of, for example, a tube, test plate, 
beads, disc, sphere etc. The insolubilized protein or substance may be prepared by reacting the 
material with a suitable insoluble carrier using known chemical or physical* methods, for 
example, cyanogen bromide coupling. 

The invention also contemplates a method for evaluating a compound for its ability to 
modulate the biological activity of a KLK-L Related Protein of the invention, by assaying for 
an agonist or antagonist (i.e. enhancer or inhibitor) of the binding of a KLK-L Related Protein 
with a substance which binds with a KLK-L Related Protein. The basic method for evaluating 
if a compound is an agonist or antagonist of the binding of a KLK-L Related Protein and a 
substance that binds to the protein, is to prepare a reaction mixture containing the KLK3r 
Related Protein and the substance under conditions which permit the formation of substance- 
KLK-L Related Protein complexes, in the presence of a test compound. The test compound may 
be initially added to the mixture, or may be added subsequent to the addition of the KLK-L 
Related^Protein and substance. Control reaction mixtures without the test compound or with a 
placebo are also prepared. Thfe formation of complexes is detected^ and^e formation of 
complexes in the control reaction but not in the reaction mixturesindicates that the test 
compound interferes with the interaction of the KLK-L Related.Protein and substance. The 
reactions maybe carried out in the liquid phase or the KLK-L Related Protein r substance, or 
test compound-may be immobilized as described herein. The ability of a compound to modulate 
the biological activity of a KLK-L Related Protein of the invention may be tested by 
determining the biological effects on cells. 

It will be understood that the agonists and antagonists i.e. inhibitors and enhancers that 
can be assayed using the methods of the invention may act on one or more of the binding sites 
on the protein or substance including agonist binding sites, competitive antagonist binding sites, 
non-competitive antagonist binding sites or allosteric sites. 

The invention also makes it possible to screen for antagonists that inhibit the effects of 
an agonist of the interaction of KLK-L Related Protein with a substance which is capable of 
binding to the KLK-L Related Protein. Thus, the invention*may be*used> to assay for a 
compound that competes for the same binding-site-of a KLK-L RelatedvFroteiife 
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The invention also contemplates methods for identifying compounds that bind to 
proteins that interact with a KLK-L Related Protein. Protein-protein interactions may be 
identified using conventional methods such as co-immunoprecipitation, crosslinking and co- 
purification through gradients or chromatographic columns. Methods may also be employed that 
5 result in the simultaneous identification of genes which encode proteins interacting with a KLK- 
L Related Protein. These methods include probing expression libraries with labeled KLK-L 
Related Protein. 

Two-hybrid systems may also be used to detect protein interactions in vivo. Generally, 
plasmids are constructed that encode two hybrid proteins. A first hybrid protein consists of the 
1 0 DNA-binding domain of a transcription activator protein fused to a KLK-L Related Protein, and 
the second hybrid protein consists of the transcription activator protein's activator domain fused 
to an unknown protein encoded by a cDNA which has been recombined into the plasmid as part 
Q of a cDNA library. The plasmids are transformed into a strain of yeast (e.g. S. cerevisiae) that 

llj contains a reporter gene (e.g. lacZ, luciferase, alkaline phosphatase, horseradish peroxidase) 

1 5 whose regulatory region contains the transcription activator's binding site. The hybrid proteins 
jjjjj alone cannot activate the transcription of the reporter gene. However, interaction of the two 

■» hybrid proteins reconstitutes the functional activator protein and results in expression of the 

jji reporter gene, which is detected by an assay for the reporter gene product. 

It will be appreciated that fusion proteins may be used in the above-described methods, 
ifi 20 In particular, KLK-L Related Proteins fused to a glutathione-S-transferase may be used in the 
methods. 

The reagents suitable for applying the methods of the invention to evaluate compounds 
that modulate a KLK-L Related Protein may be packaged into convenient kits providing the 
necessary materials packaged into suitable containers. The kits may also include suitable 
2 5 supports useful in performing the methods of the invention. 
4.3 Compositions and Treatments 

The substances or compounds identified by the methods described herein, antibodies, 
and an ti sense nucleic acid molecules of the invention, and peptides may be used for modulating 
the biological activity of a KLK-L Related Protein, and they may be used in the treatment of 
30 conditions such as cancer (e.g. prostate, testicular, or breast cancer). Accordingly, the 
substances, antibodies, peptides, and compounds may be formulated into pharmaceutical 
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compositions for administration to subjects in a biologically compatible form suitable for 
administration in vivo. By "biologically compatible form suitable for administration in vivo" is 
meant a form of the active substance to be administered in which any toxic effects arc 
outweighed by the therapeutic effects. The active substances may be administered to living 
organisms including humans, and animals. Administration of a therapeutically active amount 
of a pharmaceutical composition of the present invention is defined as an amount effective, at 
dosages and for periods of time necessary to achieve the desired result.-Fof example, a 
therapeutically active amount of a substance may vary according to factors such as the disease 
state, age, sex, and weight of the individual, and the ability of antibody to elicit a desired 
response in the individual. Dosage rcgima may be adjusted to provide the optimum therapeutic 
response. For example, several divided doses may be administered daily or the dose may be 
proportionally reduced as indicated by the exigencies of the therapeutic situation. 

The active substance may be administered in a convenient manner such as by injection 
(subcutaneous, intravenous, etc.), oral administration, inhalation, transdermal application, or 
rectal administration^ -Depending on the route of administration; the active substance may be 
coated in a material to protect the substance from the action of enzymes, acids and other natural 
conditions that may inactivate the substance. , 

The compositions described herein can be prepared by per se known methods for the 
preparation . of pharmaceutically acceptable compositions*which can be administered to subjects, 
such that an effective quantity of the active -substance is combined*in a mixture with a 
pharmaceutically acceptable vehicle. Suitable vehicles are described, for example, in 
Remington's Pharmaceutical Sciences (Remington's Pharmaceutical Sciences, Mack Publishing 
Company, Easton, Pa., USA 1985). On this basis, the compositions include, albeit not 
exclusively, solutions of the active substances in association with one or more pharmaceutically 
acceptable vehicles or diluents, and contained in buffered solutions with a suitable pH and iso- 
osmotic with the physiological fluids. 

Based upon their homology to genes encoding kallikrein, nucleic acid molecules of the 
invention may be also useful in the treatment of conditions such as hypertension, -cardiac 
hypertrophy, arthritis, inflammatory disorders, and blot clotting disorders. * 

Vectors derived from retroviruses, adenovirus, herpes or vaccinia viruses, or from 
various bacterial plasmids/may be used to deliver nucleic acid moleculesrto a targeted organ, 




tissue or cell population. Methods well known to those skilled in the art may be used to 
construct recombinant vectors which will express antisense nucleic acid molecules of the 
invention. (See, for example, the techniques described in Sambrook et al (supra) and Ausubel 
et al (supra)). 

The nucleic acid molecules comprising full length cDNA sequences and/or their 
regulatory elements enable a skilled artisan to use sequences encoding a protein of the invention 
as an investigative tool in sense (Youssoufian H and HF Lodish 1993 Mol Cell Biol 13:98-104) 
or antisense (Eguchi et al (1991) Annu Rev Biochem 60:631-652) regulation of gene function. 
Such technology is well known in the art, and sense or antisense oligomers, or larger fragments, 
can be designed from various locations along the coding or control regions. 

Genes encoding a protein of the invention can be turned off by transfecting a cell or 
tissue with vectors which express high levels of a desired KLK-L-encoding fragment? SucK" 
constructs can inundate cells with untranslatable sense or antisense sequences. Even in the 
absence of integration into the DNA, such vectors may continue to transcribe RNA molecules 
until all copies are disabled by endogenous nucleases. 

Modifications of gene expression can be obtained by designing antisense molecules, 
DNA, RNA or PNA, to the regulatory regions of a gene encoding a protein of the invention, ie, 
the promoters, enhancers, and introns. Preferably, oligonucleotides are derived from the 
transcription initiation site, eg, between -10 and +10 regions of the leader sequence. The 
antisense molecules may also be designed so that they block translation of mRNA by preventing 
the transcript from binding to ribosomes. Inhibition may also be achieved using "triple helix" 
base-pairing methodology. Triple helix pairing compromises the ability of the double helix to 
open sufficiently for the binding of polymerases, transcription factors, or regulatory molecules. 
Therapeutic advances using triplex DNA were reviewed by Gee J E et al (In: Huber B E and B 
I Carr (1994) Molecular and Immunologic Approaches, Futura Publishing Co, Mt Kisco N.Y.). 

Ribozymes are enzymatic RNA molecules that catalyze the specific cleavage of RNA. 
Ribozymes act by sequence-specific hybridization of the ribozyme molecule to complementary 
target RNA, followed by endonucleolytic cleavage. The invention therefore contemplates 
engineered hammerhead motif ribozyme molecules that can specifically and efficiently catalyze 
endonucleolytic cleavage of sequences encoding a protein of the invention. 
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Specific ribozyme cleavage sites within any potential RNA target may initially be 
identified by scanning the target molecule for ribozyme cleavage sites which include the 
following sequences, GUA, GUU and GUC Once the sites are identified, short RNA sequences 
of between 15 and 20 ribonucleotides corresponding to the region of the target gene containing 
the cleavage site may be evaluated for secondary structural features^which may render the 
oligonucleotide inoperable. The suitability of candidate targets may also be determined by 
testing accessibility to hybridization with complementary oligonucleotides using ribonuclease 
protection assays. 

Methods for introducing vectors into cells or tissues include those methods discussed 
herein and which are suitable for in vivo, in vitro and ex vivo therapy. For ex vivo therapy, 
vectors may be introduced into stem cells obtained from a patient and clonally propagated for 
autologous transplant into the same patient (See U.S. Pat Nos. 5,399,493 and 5,437,993^ 
Delivery by transfection and by liposome are well known in the art. 

The nucleic acid molecules disclosed herein may also be used in molecular biology 
techniques that have not yet been developed, provided the new techniques rely on properties of 
nucleotide sequences that are currently known, including but not limited to such properties as 
the triplet genetic code and specific base pair interactions. ^ 

The activity of the substances, compounds, antibodies, nucleic*acid molecules, and 
compositions* of the invention may be confirmed in animal experimental model systems. 

The following non-limiting example is illustrative of the>presentdnvention: 

Example 

MATERIALS AND METHODS 

Identification of positive PAC and BAC genomic clones from a human genomic DNA 
library 

The sequence of PSA, KLK1, KLK2, NES1 and Zyme genes is already known. 
Polymerase chain reaction (PCR)-based amplification protocols have been developed which 
allowed generation of PCR products specific for each one of these genes. Using these PCR 
products as probes, labeled with 32 P, a human genomic DNA PAC library and a human genomic 
DNA BAC library was screened for the purpose of identifying positive clones of approximately 
100-150 Kb long. The general strategies for these experiments have<been«published elsewhere 
(14). yrhe-genomic-libraries-were-spotted in duplicate on nylon membranes*and«positive clones 
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were further confirmed by Southern blot analysis as described (14). 
DNA sequences on chromosome 19 

The Lawrence Livermore National Laboratory participates in the sequencing of the 
human genome project and focuses on sequencing chromosome 19. Large sequencing 
information on this chromosome is available at the website of the Lawrence Livermore National 
Laboratory (Mp://www -bio.llnl.gov/genome/gemnome.html). 

Approximately 300 Kb of genomic sequences were obtained from that website, 
encompassing a region on chromosome 19ql3.3 - 13.4, where the known kallikrein genes are 
localized. This 300 Kb of sequence is represented by 8 contigs of variable lengths. By using 
a number of different computer programs, an almost contiguous sequence of the region was 
established as shown diagramatically in Figure 1. Some of the contigs were reversed as shown 
in Figure 1 in order to reconstruct the area on both strands of DNA. 

By using the published sequences of PSA, KLK2, NES1 and Zyme and the computer 
software BLAST 2, using alignment strategies, the relative positions of these genes on the 
contiguous map were identified (Figure 1). These known genes served as hallmarks for further 
studies. An EcoRl restriction map of the area is also available at the website of the Lawrence 
livermore National Laboratory. Using this restriction map and the computer program 
WebCutter flittp://www .firstmarket.com/cutter/cut2.html). a restriction study analysis of the 
available sequence was performed to further confirm the assignment and relative positions of 
these contigs along chromosome 19. The obtained configuration and the relative location of the 
known genes are presented in Figure L 
Gene prediction analysis 

For exon prediction analysis of the whole genomic area, a number of different computer 
programs were used. These programs are listed in Table 1. All these programs were initially 
tested using known genomic sequences of the PSA, Zyme, and NES1 genes. The more reliable 
computer programs, GeneBuilder (gene prediction), GeneBuilder (exon prediction), Grail 2 and 
GENEED-3 were selected for further use. 
Protein homology searching 

Putative exons of the new genes were first translated to the corresponding aminoacid 
sequences. BLAST homology searching for the proteins encoded by the exons of the putative 
new genes were performed using the BLASTP program and the Genbank databases. 
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RESULTS 

Relative position of PSA, KLK2, Zyme and NES1 on Chromosome 19 

Screening of the human BAC library identified two clones which were positive for the 
Zyme gene (clones BAC 288H1 and BAC 76F7). These B ACs were further analyzed by PCR 
and primers specific for PS A, NES1, KLK1 and KLK2v These analyses*indicated that both 
BACs were positive for Zyme, PSA and KLK2 and negative-for KLKLand NES1 genes. 

Screening of the human- PAC genomic- library identified a PAG*clone which* was 
positive for NES 1 (clone PAC 34B1). Further PCR analysis indicated that this PAC clone was 
positive for NES 1 and KJLK1 genes and negative for PSA, KLK2 and Zyme. Combination of 
this information with the EcoRl restriction map of the region allowed establishment of the 
relative positions of these four genes. PSA is the most centromeric, followed by KLK2, Zyme 
and NES1. Further alignment of the known sequences of these genes with the 300 Kb contig 
enabled precise localization of all four genes and determination of the direction of transcription, 
as shown by the arrows in Figure 1. The KLK1 gene sequence was not identified on any of 
these contig and appears to be further telomeric to NESl^since it is co-localized on the same 
PAC as NES1). 
Identification^ new genes ?«* 

A set of rules^was used to consider the presence of a new^gene -in the.genomic area of 
interest as follows: ^ v 

1. Clusters of at least 3 exonsshould*be -found? 

2. Only exons^with^high prediction score ("good" or "excellent" quality, as indicated by the 
searching programs) were considered for the construction of the putative new genes. 

3. Exons predicted were reliable only if they were identified by at least two different exon 
prediction programs. 

By using this strategy, eleven putative new genes were identified of which three were 
found on subsequent homology analysis to be known genes not previously mapped i.e. the 
human stratum corneum chymotrypsin enzyme (HSCCE), human neuropsin, and trypsin-like 
serine protease (TLSP). Their relative location is shown in Figure 1. In addition, one other 
putative new gene (gene UG) was identified which showed no homology? at-the protein level, 
with the kallikrein proteins. The five remaining genes all have variable»homologies with known 
human-or animaLkallikrein proteins and/or other- known sen ne<proteases»(depieted»^ 
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KLK-L2, KLK-L3, KLK-L4 and KLK-L5 in Figure 1). 

In Tables 2 to 7, the preliminary exon structure and partial protein sequence for each one 
of the six newly identified genes is shown. In Table 8, some proteins are presented which 
appear, on preliminary analysis, to be homologous to the proteins encoded by the putative new 
genes. 

DISCUSSION 

Prediction of protein-coding genes in newly sequenced DNA becomes very important 
after the establishment of large genome sequencing projects. This problem is complicated due 
to the exon-intron structure of the eukaryotic genes which interrupts the coding sequence in 
many unequal parts. In order to predict the protein-coding exons and overall gene structure, a 
number of computer programs were developed. All these programs are based on the 
combination of potential functional signals with the global statistical properties of known 
protein-coding regions (15). However, the most powerful approach for gene structure prediction 
is to combine information about potential functional signals (splice sites, translation start or stop 
signal etc.) together with the statistical properties of coding sequences (coding potential) along 
with information about homologies between the predicted protein and known protein families 
(16). 

In mouse and rat, kallikreins are encoded by large multigene families and these genes 
tend to cluster in groups with a distance as small as 33 - 7.0 Kb (3). A strong conservation of 
gene order between human chromosome 19ql3.1 - ql3.4 and 17 loci in a 20-cM proximal part 
of mouse chromosome 7, including the kallikxein locus, has been documented (17). 

In humans, only a few kallikrein genes were identified. In fact, only KLK1, KLK2 and 
KLK3 (PSA) are considered to represent the human kallikrein gene family (9). The work 
described herein provides strong evidence that a large number of kallikrein-like genes are 
clustered within a 300Kb region around chromosome 19ql3.2 - q!3.4. The three established 
human kallikreins (KLK1, KLK2, KLK3), Zyme and NES1, as well as the stratum corneum 
chymotrypticn enzyme, neuropsin, and TLSP (trypsin-like serine protease) and another five new 
genes , KLK-L1 to KLK-L5, may constitute a large gene family. This will bring the total number 
of kallikrein or kallikrein-like genes in this region of chromosome 19 to thirteen. 

The human stratum corneum chymotryptic enzyme (19), neuropsin (20) and trypsin-like 
serine protease (TLSP) (21) are three previously characterized genes which have many structural 
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similarities with the kallikreins and other members of the serine protease family. However, they 
have not been mapped in the past. Their precise mapping in the region of the kallikrein gene 
family indicates that these three genes, along with the ones that were newly identified, or are 
already known, constitute a family that likely originated by duplication of an ancestral gene. 
The relative localization of all these genes is depicted in Figure 1. 

Kallikrein genes are a subfamily of serine proteases, traditionally characterized 
by their ability to liberate lysyl-bradykinin (kallidin) from kininogeii (18^%More recently, 
however, a new, structural concept has emerged to describe kallikreins. From accumulated 
sequence data, it is now clear that the mouse has many genes with high homology to kallikrein 
coding sequences (19-20). Richard and co-workers have contributed to the concept of a " 
kallikrein multigene family" to refer to these genes (21-22). This definition is not based much 
on specific enzymatic function of the gene product, but more on its sequence homology " an3 
their close linkage on mouse chromosome 7. In humans, only KLK1 meets the functional 
definition of a kallikrein. KLK2 has trypsin-like enzymatic activity and KLK3 (PSA) has very 
weak ehymotrypsin-like enzymatic activity. These activities of KLK2 and KLK3 are not known 
to liberate biologically active peptides from precursors. Based on*the>newer definition, members 
of the -kallikrein family include, not only the -gene «for the ikalUkrein enzyme^ibut also genes 
encoding other homologous proteases, including the enzyme that processes^the.precursors of the 
nerve growth factor and epidermal growth factpr^SX^Thfereforc, it is important to note the clear 
distinctiombetween the enzyme kallikrein and a kallikrein or a kallilroin-lila^gene^ 

En carrying out the study only exons-were considered* which were predicted with "good" 
or "excellent" quality and only exons were considered which were predicted by at least two 
different programs. Moreover, the presence of a putative gene was only considered when at 
least three exons clustered coordinately in that region. Additional evidence that these new genes 
are indeed homologous to the known kallikreins and other serine proteases comes from 
comparison of the intron phases. As published previously (14), trypsinogen, PSA and NES1 
have 5 coding exons of which the first has intron phase I (the intron occurs after the first 
nucleotide of the codon), the second has intron phase II (the intron occurs after the second 
nucleotide and the codon), the third has intron phase I and the fourth-has intron phase 0 (the 
intron occurs between codons). The fifth exon contains*the stop.codon^The-intron phases of 
the predicted new kallikrein-like genes follow these mles and-are shown4n the^respective tables. 
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Further support comes from the identification in the new genes, of the conserved amino acids 
of the catalytic domain of the serine proteases, as presented in Tables 2-6. 

In order to test the accuracy of the computer programs, known genomic areas containing 
the PSA, Zyme and KLK2 genes were tested. Two of these programs (Grail 2 and GeneBuilder) 
were able to detect about 95% of the tested known genes (data not shown). Matches with 
expressed sequence tag sequences (EST) can also be employed for gene structure prediction in 
the GeneBuilder program and this can significantly improve the power of the program especially 
at high stringency (e.g. >95% homology). 

In mouse, ten of the kallikrein genes appear to be pseudogenes (9). One of the new 
genes (UG) does not show homology with the kallikrein genes. However, it has some proein 
homology with myelin associated glycoprotein (Table 8). There may still be an association 
between UG and the kallikrein genes since some mouse kallikreins are related to nerve growtff 
factor, as discussed earlier (8) and Zyme as well as neuropsin and T^SP, were found to be 
highly expressed in brain tissue and it is claimed that Zyme may be related to Alzheimer's 
disease (11). 

Having illustrated and described the principles of the invention in a preferred 
embodiment, it should be appreciated to those skilled in the art that the invention can be 
modified in arrangement and detail without departure from such principles. All modifications 
coming within the scope of the following claims are claimed. 

All publications, patents and patent applications referred to herein are incorporated by 
reference in their entirety to the same extent as if each individual publication, patent or patent 
application was specifically and individually indicated to be incorporated by reference in its 
entirety. 
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Table 1. Exon or gene-prediction programs used»in'this»study<!s>' i 



No. 
1 


Program name 

GeneBuilder (gene 
prediction) 


Source 

Institute of Advanced 
Biomedical Technologies* 


Website or e-mail address 

http://125.itba5mi.cnr.it/— we 
beene/eenebuilder.html 


2 


GeneBuilder(exon 
prediction) 


Institute of Advanced 
Biomedical Technologies 


http://125.itba.irri.ciir.it/~we 
beene/eenebuilder.html 


3 


ORF gene 


Institute of Advanced 
Biomedical Technologies 


http^/125 .itba.mi .cnr.it/-we 
beene/wwworfeene2 .html 


4 

m 


GENEID-3 


BioMolecular Engineering 
Research Center, Boston 
University 


http://apolo.imim.es/geneid. 
html ^ 
( eeneid©.darwin.bu.edu> 




Grail 2 


Oak Ridge National Laboratory 


httD://comnbio.ornl.eov 


M 


FGENEH 


Baylor College of Medicine, 
Houston; Texas*** 


http://mcrb.bcm.tmc.edu 



ffl . In the final analysis of the sequences programs 1, 2, 4 and 5 only were used. 
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Table 8 . Homology between the predicted amino acid sequences of the newly identified putative genes and 
protein sequences deposited in Genbank 



No. 


Gene identity' 


Homolgous known protein 


Identiry% 
(number of 
amino acids) 


i 


KLK-L1 < 


► Human stratum corneum chymotryptic enzyme 

► Rat kallikrein 

► Mouse glandular kallikrein K22 

► Human glandular kallikrein - 

► Human prostatic specific antigen 

► Human protease M 


44(101/227) 
40( 96/237) 
39( 94/236) 
38( 93/241) 
37( 91/241) 
37( 87/229) 




KI K-I 1 a 


' nuiuaii neuropsin 

► Human stratum corneum chymotryptic enzyme 

► Human protease M 

► Human trypsinogen I 

i R at trvncinn<Tf*n 


48H 06/219^ — 
47(103/216) 
45( 99/219) 
45(100/221) 
AM 98/9901 


3 


KLK-L3 « 


► Human neuropsin 

► Rat trypsinogen 4 

► Human protease M 

► Human crlanHtilar kallikrein 

► Human prostatic specific antigen 


44(109/244) 
39( 95/241) 
38( 98/253) 
37C QA/2AK\ 
36( 89/242) 


4 


KLK-L4 


> Human protease M 
ft Human neuropsin 

> Mouse neuropsin 

» Human glandular kallikrein 

» Human prostatic specific antigen 


52(118/225) 
51(116/225) 
51(116/226) 
48(113/234) 
47(108/227) 


5 


KLK-L5 « 


» Human neuropsin 

► Rat trypsinogen I 

► Rat trypsinogen II 

► Human protease M 


44(81/184) 
42(76/178) 
42(75/178) 
41(73/178) 


6 


UG 


> Human myeloid cell surface antigen CD33 

> Human OB binding protein-2 
ft Human OB binding protein- 1 

> Human myelin associated glycoprotein 


61(144/233) 
50(166/328) 
43(189/431) 
27( 86/311) 



We Claim: 
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1. An isolated nucleic acid molecule which comprises: 

(i) a nucleic acid sequence encoding a protein having substantial sequence identity 
preferably at least 60% sequence identity, with an amino acid sequence of KLK- 
L1-KLK-L5 as shown in Tables 2 to 6; 

(ii) a nucleic acid sequence encoding a protein comprising*.with*an amino acid 
sequence of KLK-L1 -KLK-L5 as shown in Tables 2 to 6; 

(iii) nucleic acid sequences complementaiy to (i); 

(iv) a degenerate form of a nucleic acid sequence of (i); 

(v) a nucleic acid sequence capable of hybridizing under stringent conditions to a 
nucleic acid sequence in (i), (ii) or (iii); 

(vi) a nucleic acid sequence encoding a truncation, an analog, an allelic or species 
variation of a protein comprising with an amino acid sequence of KLK-L1- 
KLK-L5 as shown in Tables 2 to 6; or 

(vii) a fragment, or allelic or species variation of (i), (ii) or -(iii)* 



ABSTRACT OF THE DISCLOSURE 

The invention relates to nucleic acid molecules, proteins encoded by such nucleic acid 
molecules; and use of the proteins and nucleic acid molecules 

5 
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KLK-Ll 



TATCTCATGAGAGAGAATAAGAACATGAAAAGAGAAAGAATGAGAGAGAG 

AGAGAGAAAGAAAAAGGAGAGTGGAGTCTAGGATCTGGGCAGGGGTCTCC 

TCCCTGGGTCCCTAGACCCTGCTGCCAGCCCCTTCTGGGCCCCCAACCAC 

TGCCTGGTCAGAGTTGAGGCAGCCTGAGAGAGTTGAGCTGGAAGTTTGCA 

GCACCTGACCCCTGGAACACATCCCCTGGGGGCAGGCCAGCCCAGGCTGA 

GGATGCTTATAAGCCCCAAGGAGGCCCCTGCGGAGGCAGCAGGCTGGAGC 

TCAGCCCAGCAGTGGAATCCAGGAGCCCA.GAGGTGGCCGGGTAAGAGGCC 

TGGTGGTCCCCCACTAAAAGCCTGCAGTGTTCATGATCCAACTCTCCCTA 

CAGCTCCATGTCGCTGGATTCTCAGCCTCTGTGCCTTCTGTCTCCACATC 

TCTCTAGACAGATCTCTCACTGTCTCTAGTTAGGAGTCACTGTCTCTAGT 

TAGGGGTCTCTCTGTCTCTCTGAATCTATATCTCCATGTCTAACTCTCAG 

ACTGTCTCTGAGGATATCTCTCAAGCACTCTGTCTCTCCGGCTCTGATTC 

TCTGTGTGTCTTCCCTCCATGCTTGTTTGTGGGTGGCTAGACACCATCTC 

TCCCCATTCACAGATGGCTAGATGCTTTCTCTAAA CTTTCC TTTCTACCT 

AGTTCTCTCTCTCTCTCTTTTCCCATCTCTCTCTCTCTTTTTCTCTCTCA 

GTCTCTAAATCTGTCTCTCTAGGTTCTGGGTCCATGGATGGGAGAGGGGG 

TAGATGGTCTAGGCTCTTGCCTACCTAATAACGTCCCAGAGGGAAGAAAG 

GGAGGGACAAAGAGAGGGATGGAGAGACTTGGGCTGAAGATCCCCAGACA 

CGGCTAAGTCTCAGTCCTCATCCCCAGGTGCTGACGTGATGGCCACAGCA 

GGAAATCCCTGGGGCTGGTTCCTGGGGTACCTCATCCTTGGTGTCGCAGG 

TATCTGAGTATGCGTGTGTGTGTCTGTCCGTGCTTGGGGGCACAGTGTTT 

GTTAATGTTCAGGTGTGACTCAGTGTCCTCTTGCTTGTGACTGCAAAGCT 

GCCTGTGAGACGGTACCGTGTTATCCGTCCGCCATGGCTGTGCCCCTGCA 

ACTCCTTGTATCGTGGTAAATTTGTGTGTGGCAGTGTGCCTGGGTGTGTG 

GTTGTACCTGTGAGACTCTGACAGTTTGTGCCTCTGAATATCTGGTGGAG 

TGACAACAGTGTAATGATGATATGGGGACAGGGGAAGCCGAGGGTGCAGG 

AGATTGTGCTTCCTGGGGCGTGATCCATTGCTGGGAATCTGTGCCTGCTT 

CCTGGGTCTTCAGTCCTGAGATCCCCCTCTCCCATCCCCAAGGAACTCAC 

CTCACAGGACTATAAAACGGTGTTTTGGTGTGCATGGGCTTGTGGCTTGG 

TGTGACTGTGGGCAAGGCTGGGAGAGGATAGGAGTGACTCGGCGCAGGAC 

CGACTCTTTGAGCATCAGTCTGCGCAGACAAGTGACCCGATCCTTGCTCC . 

CAGCAACAACTCCACCCCCTGAGCTTTAATTCAeCCCGAAGGACCCGATC 

CTACCGCTATGAGCCTAGACTCCTCTGTTGAACCCCTCCTGACCGTGGCT 

TTGCACCGCGATGGCACCAGTCTCACCTCCAGAGCTCACCCCAGAGCCCT 

GACTCCGCCCCAGAAGCCCTGGTCCCACCTTCTGAGACTGCCTCTAGCCA 

TAACCCAGCTCTTGAAGCCTTGATGGCGCCCCTGCGCTGTAACCCCAACC 

CTAGGAGCACTGATCCCGCCTTCTCAGCCCACCCCCATGCCCTGACTCTC 




FIGURE 2 (cont'd) 

CTCCCAGGAGCCCTGACTACCCTGAATCCCTGACCAGGCTCCTGCACCGT 
GATCACCGCCCCTGGGAGCCCTAGGCCTATATCCTGGACCAGCCCCTGAA 
GCTCCGATCATGACCCCTGCACCATAACCCCACCCCCAGGAGCCCTGGGT 
CCGCGCCCTGGGCCCGG€CeCAGGGCTGAGTCGGG€CCeCAAGAGT€*CTG 
ACTGCTCCTGAAGCCCTGAGCACGCCCCTGCTCGGTAACCCCTCeee€AA 
GAGCCCTGGGCCCGCCTCCTGAGCCCGTTCCCAGGGCTGAGTC<SGGGCCG 
AGGAGCCCTGACTGCTCCTGAACCTCTGAGCACGCeCCTGCTCGGTAAGC 
CCACCCCCAGGAACCCTGGGCCCGCCTCCTGGTCCCGATCCCATGCCEGA 
CTCCGCCCTCA GGATCTCTCGTCTCTGGTAGCTGCAGCCAAATCATAAAC 
GGCGAGGACTGCAGCCCGCACTCGCAGCCCTGGCAGGCGGCACTGGTCAT (1) 
GGAAAACGAATTGTTCTGCTCGGGCGTCCTGGTGCATCCGCAGTGGGTGC 
TGTCAGCCGCACACTGTTTCCAGAA GTGAGTGCAGAGGTAGGGGGAGTGG 
GCAGGGCCTGGGTCCGGGGGCGGGGCCTAATATCAGGCTCATCTTGGGGT 
GCTCAGGGGGAAACAGCGGTGAAGGCTCTGGGAGGAGGACGGAATGAGCC 
TGGATCCGGGGAGCCCAGAGGGAAGGGCTGGGAGGCGGGAATCTTGCTTC — - 
GGAAGGACTCAGAGAGCCCTGACTTGAAATCTCAGCCCAGTGCTGAGTCT 
n CTAGTGAACTAAGGCAAGTTCTTGTCCCTGAATTTTTGTGAATGAGGATT 
\'X TGAGACCATGGTTAAGTAGCTCTTAGGGTGTTTAGCGAAGAGGGTGGGGT 
|y TGGGGTTAGGAGATGGGGATGGGAATGGGGTTGAAGATGAGAATGGAGGT 
K\ AAGC'ATGTAGTTGCCACAAAACTGACCTGCGCTCGGTGGCCCACAGCTCC 
y TACACCATCGGGCTGGGCCTGCACAGTCTTGAGGCCGACCAAGAGCCAGG ci 

03 GAGCCAGATGGTGGAGGCCAGCCTCTCCGTACGGCACCCAGAGTACAACA 
& GACCCTITGCTCGGTAACGACCTCATGCTCATGAAGTTGGACGAATCCGTG 
\ TCCGAGTCTGACACCkTCCGGAGCATCAGCATtGCTTCGGAGTG€CCTAC 
H CGCGGGGAAGTCTFTGCCrCGTTTCTGG®rGGGGTC^GCTGG€GAAGGGTG 
^ AGCTCACGGGTGTGTGTCTGCCCTCTTCAAGGAGGTCCTCTGCCCAGTCG- 
IS GfiOnfiOGTOAGrrAGAGCTGTGCnTCeCA GGCAGAATGCCTACCGTGCTG 
J*n CAGTGCGTGAACGTGTCGGTGGTGTCTGAGGAGGTCTGGAGTAAGCTCTA (3) 

5 TGACCCGCTGTACCACCCCAGCATGTTCTGCGCCGGCGGAGGGCAAGACC 
AGAAGGACTCCTGCAACG TGAGAGAGGGGAAAGGGGAGGGCAGGCGACTC 
AGGGAAGGGTGGAGAAGGGGGAGACAGAGACACACAGGGCCGCATGGCGA 
GATGCAGAGATGGAGAGACACACAGGGAGACAGTGACAACTAGAGAGAGA 
AACTGAGAGAAACAGAGAAATAAACACAGGAATAAAGAGAAGCAAAGGAA 
GAGAGAAACAGAAACAGACATGGGGAGGCAGAAACACACACACATAGAAA 
TG<:AGTTGACCTTCCAACAGCATGG<3GCCTGAGGGCGGTGACCTCCACCC 
AATAGAAAATCCTCTTATAACTTTTGACTCCCCAAAAACCTGACTAGAAA 
TAGCCTACTGTTGACG<KXjAGCCTTACCAATAACATAAA TAGTCG ATTTA 
TGCATACGTTTTATGCATTCATGATATACCTTTGTTGGAATTTTTTGATA 
TTTCTAAGCTACACAGTTCGTCTGTGAATTTTTTTAAATTGTTGCAACTC 
TCCTAAAATTTTTCTGATGTGTTTATTGAAAAAATCCAAGTATAAGTGGA 
CTTGTGCAGTTCAAACCAlGG<}TTGTTCAAG<3GTCA*AGTGTGTXeCCA"GAG 
GGAAACAGTGACACAGATTCATAGAGGTGAAAGACGAAGAGAAAGAGGAA 
AAATCAAGACTCTACAAAGAGGCTGGGCAGGGTGGeTCATiGGCTGTAA^C 
CCAGCACTTTGGGAGGGGAG<}CAGGGAGATCAG«T^®'AGGTAA!GGA©TTC^ 
AGACCAGGCTGG^CAAAATGGTGAAA<rceTGTCTGmG«TA^AAATA*©AAA- 
AGTTAGCTGGATATGGTGGCA'GGCGCCTGTAATCCCAGCTACTTGGGAGG 
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CTGAGGCAGGAGAATTGCTTGAATATGGGAGGCAGAGGTTGAAGTGAGTT 

GAGATCACACCACTATACTCCAGCTGGGGCAACAGAGTAAGACTCTGTCT 

CAAAAAAAAAAAAAAAAAAGACTTTACAAAGAGATGCAGAGACACTGAGA 

CAGATAAACAAGCCACAAAGGAGACAAAGGAGAGACAGACAAACAGAAAC 

AGACAGACCACAAGCCCAAGAGAAGCAGCCAGCATTCAGGACATAGGACA 

TCGGGAAGCAGGATTAGATGAAGTCAGGGATCTGGAATGGGACTTCCAAC 

AGATATGTTGCTGGGCTATGTTGTTATTGATGATGGTTCTGTCTTTGTTT 

CTCAGTCTCATTTAGTTCCTTTCTGAGCCCATATCCATTTCCACCTCTCT 

jTGTTTT 7 ! * ^ TTT^ ^ r^Tr-r rn-r-Tr-TTT ATA AT A PHCTG ACTCTGGGG 

GGCCCCTGATCTGCAACGGGTACTTGrAGGGCCT TGTGTCTTTCGGAAAAX^-. 

GCCCCGTGTGGCCAAGTTGGCGTGrrAGGTGTC TAGACCAACCTCTGCAA 

ATTCACTGAGTGGATAGAGAAAACCGTGGAGGCCAGTTAA STOP 



FIGURE 3 



KLK-L 2 



GGGCCCAGAG TGAAGGCAAG AGAAGGAGTT GAGAGCTCCC TCTGCAAAGT GGCTTGAGTC 
TCCCCTGGGT AAAATGCAGG GAGAGGGAGG CAGAAAGACA GGGAAGAGGAvAGGGGTGGGG^- 
AAGAAAGAGA GAGAGAGAGA GAGACAGAAT AACACAACTA CAGAAACACA G AG AG AAGAG& 
ACAGAGAGCC TGGGACACAG GGACACACAG AGTCAGAGAG AAAAGAGAAG ATAGAGAAAG^ 
ACACAAATGG AGACACAGAG GTGTAAAGAA AGAGAGATTA ACAGAGTCCC AGATAGACGC 
AAAGGGGCAG AAGCACAGTT TTCAGGGTGG TGTCTATGAT CATCTTCTTT TTTTTTTTTT 
. r i . T TTrr rI , T TTTTTGAGAC GGAGTCTCGC TCTGTCGCCC AGGCTGGAGT GCAGTGGGGG 
GATCTCGGCT CACTGCAAGC TCCGCCTCCC GGGTTCACGC CATTCTCCTG CCTC AGCCT C 
CCAAGTAGCT GGGACTACAG GCGCCCGCCA CTACGCCCGG CTAATTTTTT TGTATTTTTA 
GTAGAGACGG GGTTTCACCG TTTTAGCCGG GATGGCCTCG ATCTCCTGAC CTCGTGATCC 
GCCCGCCTCG GCCTCCCAAA GTGCTGGGAT TACAGGCGTG AGCCACCGCG CCCGGCCATG 
ATCATCTTCT TGACTATGCT GATGTGACAA GTACCTAAAG CCATCAGACT CTACCCTTTA 
AATATGCAGT TTGGGCCAGG CACCGTGGCT CATGCCTGTA ATTCCAGCAC TTTGGGAGGC 
AGAGGTGGGT GAATCACTTG AGGCCAGGAG TTTGAGACCA GCCTGGCCAA CATGGTGAAA 
CTCTGTCTTT ACTAAAAAAA AAAAAAAAAA AAAAAAAATC AGCCGGGTGT CGTGGGGCAC 
ACCTGTAATC CCAGCTATGC TGGAGGCTGA GGCACGAGAG TCACTTGAAC CCTGGAGGCG 
^ GAGGTTGCAG TGGGCCGAGA TCACATCACC GCCCTCCAGC CTGGGCGACA GAGCAAGACT 

•f; CTGTCTCAAA TAAATAAATA AACAAAOGAA CAAGCAGTTT GTTGTACCTT AGTTATATCT 

Q AAAAAAAAAA TGCTGTCAAC AAATAGAGCA GAAGTGAAAT AAAGGAAAAT AAATGGGCCA 

1^ AGAACTCTAA GGTATATTTG ACAAATCATT CAGAACCTTT AAAAAAGAAA GAATCACAGA 

W GGGATAGAAA »GAGAGGGAGG AAGAGGGAGA CAGAAAGAGG* TGTGGGGGAA ^GGAGAACAAA 

S 4 ACAAGGCTCC TAAGACAGAC AGGAGGAGAG AGAGAGAGAG TGAGTGAGAG ACAGACAGAG 

W AAAAAGACAG AGAGAGAGAG AGAGAGAGAG AGAGAGAGAG AGGCGAGAGG* -GATAGAAAGA** 

£Q GAGAGAGGGG TGGAGAGAGA CACGAGATAT TGAGAGAGAC TCAGAAAGAT AGCGGAGGGA 

If I GAACCACAGA .GAGATGGAAG AAGACTCTGA GAAAAAACCA GAGACAA^ A*vTGGAAAGAGG 

'* AGTATGGAGG^ GTGAAGAGAG * AGTGGTGG AA -TGAGCAAAAT GCAGAGAAGA AAGCAAGCAA- 

TCCAGGCGCC AAGAATAGTG ACCCAGAGTT GGTGAGAAGC^CMATCCTTA* AGGGTGGGGG 
^ AGGCAGGGAA GGGGGTGGGC TGGCTTCCGG AGACGCCTCC ; "CCATTCTCGG GGCCAGGGAG 

% GTAGGGAGTG ~ AGATTCCGGA CTGGGTGGGG - GGTGCTCTGG GGGTGGAGAT *AGGGGGAGGA 

GGAGGAGCTA TTGCTAAGGC *CCGATAGGCA "CCTCATTGCC :iOGGGAATGTG* ;CCGCAGGGAG' 
CAGTGGGTGG TTATAAGTCA GGGCCGGTGC CCAGAGCCCA GGAGGAGGGA GTGGGGAGGA 
tQ AGGCACAGGC CTGAGAAGTC TGCGGCTGAG CTGGGAGCAA ATCCCGCACG CCGTAGCTGG^ 

l 43 GGGAGAGGGG AAGTGAGAGG TGGTGAGGGT GGCTCAGCAG GCAGGGAAGG* . AGAQ{3TGTGT 

GTGCGTCCTG CACCCACATC TTTCTCTGTC CCCTCCTTGC CCTGTCTGGA GGCTGCTAGA 
CTCCTATCTT CTGAATTCTA TAGTGC CTGG GTCTCAGCGC AGTGCCGATG GTGGCCCGTC 
CTTGTGGTTC CTCTCTACCT GGGGAAATAA GGTAGGGGAG GGAGGGGAAG TGGGTTAAGG 
GCTCCCCGGA TCGCCTGGGC CTCCCAACCC TCTGACATTC CCCATCCAGG TGCAGCGGCC 
ATGGCTACAG CAAGACCCCC CTGGATGTGG GTGCTCTGTG CTCTGATCAC AGCCTTGCTT 
CTGGGGGTCA CAGG TAACCA GAACTCTGGG GTGGGAGGGT TGTGGGATTG GGAGGACTGT 
CTCTGCGGCA CTAGAGCGCC TGTCCCCTGG GGAACTGTGT GAGCCTGGGC ATGACTCCGG 
GACCGGGTGA ATGTGAGTCT CTGTCTGTAC TTGTGGTTGT GCGATCGTAT GTGGCCCTGT 
GACTGCCACG GTGTGTGTCG GGGAGGGGGA TGCCTTTTCC CATATCAGGT GACTGTGCGG 
CAGGTGGCAC TGACCCTTTG AGGCTGTGTG TGTGGTTTTG TGATTGTGTG TGCATTTAAG 
ATTGTGTGTG GCTCCACAGC TGTGTGGGTG AATGCATGTA GCACTGGGGG TGTTCACTGT 
GTGTTTGGCT GTGTGTGGTG ACT TGGCATT G T AT ATG ACT GCAGGTATCT GCAGTTCCTG 
TCCCTGAGGT CCCGGGATTG CGTGCAACAA AAGTGGTCAT CACCATGGAA AGCTGTGACT 
GTGTGGTGCT v TGCAGGGGAT ■ TATGTGATTG TGGCTGAGTG' TGAGGTTATG GATGGG<2GTA*V 
TTTGTGACCG TGTGACTACC TGAAGCTCTG TGTAGGGGTG ACTGTATGTG ACTGTGTGTG^ 
TCTGTGTGAG' GCCGTGTAAA TGCTACTGTA TGTGTGATGG TGCAGCTGTG TGTGTGGAGT^ 
TTCTGTCTCT GCCTGGAGGG ATAGAGGGTG" CAGGGGTAGG- TATCTCTGGG AGATGGG$GG*fc 
CAGGTGAGTG^ACTTGGAGTG TGTGCCTGTG TGCAGAAGAG TATGTGGGAG TCTGAAGATG^ 
TGTGCTGACA&eGGGATCTGT^GCGTGGGAGT GAGAGAGTCT'4GGATGAGGGT^GTGGGATGGe^r 
GCTAGGCTGC CCGGGAGCGT GTGTACCTGG AGACAGAGCT GTATGTTAGC TGCACCTGTG 
GAGGCAACAT GGGCGTGTCT GCAGAACTGC GTGCGTGCTT GGCTGTTACT GCTGTTGTGC 
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FIGURE 3 (cont'd) 



GCGTGGTTCT TGGGGTGAGT TCGTGAATGA TGGTGGTGCC AGGGCCATCA GCAAGGGTAA 
GAACCAGGCC GGGCGCGGTG GCTCACGCCT GTAATCCCAG CCCTTTGGGA GGCCGAGGCA 
GGCGGATCAC CTGAGGTCGG GAGATCGAGG CCAGCCTGAC CAACATGGAG AACCCCGTCT 
CTACTAAAAA TACAAAAAAT TAG CTGGTGT GGTGGCGCGT GCCTGTAATC CCAGCTACTC 
GGGAGACTGG GGCAGAAAAA TCGCTTGAAC CCGGGAGGTG GAGGTTGCGG TGAGCCGAGA 
TCGCGCCATT GCACTCCAGC CTGGGCAACA AGAGCGAAAC TCCGTCTCGA AAGAAAAAAA 
GAAAAAAAAA AGGGTAAGAA CCAGTGAATG GGCACGGGAG GACTGATGAT GGAGTGGGGC 
ATGCATGTAG TCTGTAGGTC TGTGTGTGAG AGGAGGAGAT TGACAGGATT GAGAAGGCAT 
GTTTTCATCT GAGAATTCAG AAACCTAGGC CTGCTCTTCC CCTCCATGTG GCCCCCTAAG 
CTGAGCCCTT CTTTCCTGGT CCTGCTTTCG GAACCCTAGC TCCGCCCATG AGCTCTGACC 
CCACCTCCTT TCCTCAAGCA CGCCCCTAGG CCAGACTCTA GTGGACCCCG CCTAAGG CCA 
CACCCCTTTG GGCCAGGCTC CACCCCCTAT TCTGTGGGTA CCTTCTAGAA CCCCCTTCAA 
AGTCAGAGCT TTTTTTTTTT TTTTTTTGGA GACAGTCTTG CTCTCTCTCC CAGGCTGGAG 
TGCAGTGGCG TGATCTCGGC TCACTGCAAC CTCTGCCTCC CAGGTTCAAG TGATTCTCGT 
GCCTCCACCT CCTGAGTAGC TGGGATTACA GGTGCGdGCC ACCACGCCTG GCTAATTTTT 
GTGTCTTTAG TAGAGACAGG GTTTCACCTT GTTGGCCAGG CTGGTCTCAA ACTCCCAACC 
TCAGGTGATC CGCCCACCTC GGCCTCCCAG AGTGCTGGGG TTACAGGCGT GAGCCACCGC 
CCCCAGCCCA AAGTCAGAGC TCTTTATAGG AGACTCTAAC ATGTAACCCT GACCCTGGCC 
CTAACTAAGT CAATTCCAAA CCCCTTCCTG CCTCCAGCCC TGACCCCACT CACTGAGGCC 
IP' TGACCCCACT TCTTGAGACC AGTTCCATCC CTAAAGCCCT GGTCTCCCTC CCATCCCCAG 

Q GCTCCAGCCC CCACAGCTTT GGCACTACCC CTGAGCTTGT CCAGGAATCC TGTACCCAAT 

j«6 TTTACCCTCA CATGTAGTTC TAGCCAATTC CAGGAATCTG TGAGGTCCAG TTAGAGTCCA 

jlj GTAACCCTAC CTGAGCCTGG GCTCTGTCCT TGAGCTTGAG CCTGGGCTTG AGAGGTGCCA 

SJ CTCTTATTCT CCAGGCCCTG CCCCTGCCCC CTCAGCATGT CAGACACCCA CCCTCTAGCT 

GGTCTGGCCT CTTGAGTCTG AAACCCACCC CCAGCCCAAG CCCCGCCTCT GAGCCCCGCC 
CAACCCATTT TCCGTTCCCA GAGCATGTTC TCGCCAACAA TOATGTTTCC TGTGACCACC 



CCCGGTCGGA 


TGACAGCAGC 


AGCCGCATCA 


TCAATGGATC 


CGACTGCGAT 


ATGCACACCC 


AGCCGTGGCA 


GGCCGCGCTG 


TTGCTAAGGC 


CCAACCAGCT 


CTACTGCGGG 


GCGGTGTTGG 
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p . 

**- GAGGAGGGTT GGTGGGGACG GGGAAGTGGG GGTGGGGGTG GGGAAGTGGG GGTGGGGGTG 

Q TCATGGAGGT GAGGGCTGGT GGGGACGGGG AAGTGGGGTT GGGGGTGTCA TGGAAGGTGA 

GGGTTGGTGG GGATGGGTTG GGGATGTGGG AGCAGGAGGA GGTCGAGTTG GGGATAGGAC 
TAAGGATGGA GTTTTGCGGG GGAG CAAGGT GGGAGGATGA GGTTGGAGAG GGGAGAGTGT 
TGTGGTAGGG AATGGGAAGG AGC CAAGGAT GGGTTGGATT TGGGGTTAGG AGCATATATT 
TGTTGAATGG TTTGGGATGG AGGTGGAATT GGGATTGGCT TTAGAATTGG GGGTGGGTGA 
AAATCGGGCT GGGGTGGAAA TGAAGATAGC ATGGAGATAG GGTTGAGATT GGGAGCAGAT 
ATAGAATGAA GGATGGGGAT TGGAGTTTTG GGTGGGGTTG GAGATGGTTG GATTTGGGCT 
TGAGAATGCA TATGGTGATG GCTTCTGGGT AGGGAAAGAA TTAGGGTTGG GAATGGGATG 
GGTTTGGAAT TGTGACTGGG ATGGGGACAG GCATGGGATT GGAGACCAAG AGGGAGTTGA 
GGATGGTTTG GGGACCGGGG GTGGGGATGG GGGTGGGGCT GGGGCTGGGT GTGGGGTTGG 
GATTGGCGTT GGACGTGGAG ATAGAGATCA GGGTTGGTGG TGACCTGCCC CATCTTCCTC 
A GAGTTTTCA GAGTCCGTCT CGGCCACTAC TCCCTGTCAC CAGTTTATG A ATCTGGGCAG 
CAGATGTTCC AGGGGGTCAA ATCCATCCCC CACCCTGGCT ACTCCCACCC TGGCCACTCT 
AACGACCTCA TGCTCATCAA ACTGAACAGA AGAATTCGTC CCACTAAAGA TGTCAGACCC 
ATCAACGTCT CCTCTCATTG TCCCTCTGCT GGGACAAAGT GCTTGGTGTC TGGCTGGGGG 
ACAACCAAQA GCCCCCAAGG TGAGTGTCCA GGTTCTTCTT GATACCGACC CATCTCTGCC 
GCCTTCCATC TTTCTCCACT TCTCATTGTG TTCCTGTTTG ACAG TGCACT TCCCTAAGGT 
CCTCCAGTGC TTGAATATCA GCCTGCTAAG TCAGAAAAGG TGCGAGGATG CTTACCCGAG 
ACAGATAGAT GACACCATGT TCTGCGCCGG TGACAAAGCA GGTAGAGACT CCTGCCAGG T 
GAGGACACCT CTCTTTATTC AGCAGATACA CACTGAGTGC CAACTCGGTA ACATGGAGCG 
TTGCCAAATT CTGAGAATCC AG C AATTGC C AAGACAGTCA GGACCCCTGT TCTCACAGAG 
CTCATACCCT AGAGTAGTGG TGTTTAG TAG AAATAATGCT GAGCTGCTTA TGTCATTTCC 
AGTTTTTTAG TAGCCACATT AAAACAGGTA AAAAAGGCTG GGCGCAGTGG CTCACACCTG 
TAATCCCAGC ACTTTGGGAG GCTGAGGCAG GCAGATCACC TTTGGTCAGG AGTTTGAGAC 
TAGC CTGGCC AACATGGCGA AACTCTGTCT CTAAAAAAAA ATACAAAAAT TAGCCTGGCA 
TGGTGGCGGG CGCCTGTAAT CTCAGCTGCT CAGGAGGCCG AGACACAAGA ATCACTTAAA 
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CCCAGGAGGT GGAGGTTGCA GTGAGCTGAG ATCGTGCCAC TCACTCCAAC CTGGGAGACA 
GAGTGACACT TTTGTCTCAA AAAGAAAAAA AAAAACAAGT AAAAAAGAAA CAGGTGAAGT 
TAACTTTAAT AACCCAATGT ATCCCAAATA CAATCATTTC AAAGTGTAAT TAATATAAAA 
CAATTATGAA- TGAGATAGTTTTAGAlSECmV- -TCTTGITTTO' ATATrFAAQUG* TTTGAAAGTG^ 
AGTATATATG TTATGCTGAC AGCACATCTC AATTTGGACT AGCTACATTT CAGGTGCTCA 
GTAGCCACAT GTGGCTAGCA GTTACTGTAT TGGATGGGAC GGATCTAGAG GGAAAGATCA 
GGGCTGTTTT GTATGGTTGG GCAGGTTGTG CACTGCATAA AGATAGCATA TCTAATAGGG** 
GCAGTCCGTG TTACAGATGT CAGTTTTGGC AGTTTTCAGG CGTGTGGTAG -TTAAGTGTCT 
TGTTTCAACA AAATCTGTAA TATGACAGTT TTCTAGCAAG TGCTGGTAAA ATATCTTGAG- 
GAAGGAAAAG AGAAATCTGG TAGGTATTTT TACAAGAGAA TATTTAATAC AGGGGATTAA 
TTGCAAAGCT GCTGGAAGGG CTGGAGGAAC AAAGTTAAAA AATAAAAAAC TCTGTGGTCA 
AGAATCTGCA TAAATAGGGC AATTTCAGAG AGTGGTAAAG GTTAACCCCA AAATAAAACA 
TGGTTTTAGG ATAGTAAACA ATAAGGGCCA ATATTCAAAA AGGTGGTCAG GGGAGCCTCC 
TTGGAGAGGT GGCATTTGAG CAGAGAATGG ATGACACAAA GAAGCTAAAC TCGTGAAGTT 
TAAGGGGAAA GAAAAGGCAC GTGCAAAGGC CCTGAGGtAG TAAGGAATTT GGCTGATTCA 
AAGAAGAAGA GGAAACCAAT GCAACTGGAG AACAAAAGTG GGGGCAACAG TAGAAAGTGA 
CGCTGGAGGT GTAGGCAGGG GCGAATGCTC TGCAAGTATT TCTTGGTCAC CAACACAGAG 
CTTCCCTATG TTCTAATGGA AGCTGTATCT GTTGAGGAAG ACAGAATTTA AAATCAAACT 
GTTACATCAA CCAGCACCCT TCTCTGTATT CAGGCTCCCA AGGGATCTAG AAGGACGTAA 
0** GTTAACAAGC TCTCATTAGC AGGGTGTGTG TTTCAACAGT AGTTAGGAAG CTGGGGATTC 

111 AGGAGTACTC CAGTCCCATG GCTATGAAAA GCTCCCCCCA AATTGTACAA ACCTGACAAA 

!=" TGCAACACCT CCCCAGCTCT CCCCATTTCT TCTCTGTGCC CTGGGTGTGG GGGGGTGGGT 

TGCGAGGGGG AAAACTTTTA ACAGAAGAAA GCACATCTCG GCCGGGCGTG GTGGCTCAGk.^ 
CCTGTAATCC CAACACTTTG GGAGGCCGAG GCGGGTGGAT CACTAGGTGA GGAGATCGAG * 
ACCATCGTGG CTGACAGGGT GAAACCCTGT CTGTACTAAA AACACAAAAA ATTAGOGGGG: 
CGTGGTGGCA GGCGCCTGTA GTCCCAGCTA CTCGGGAGGC TGAGGCAGGA GAATGGCCTG 
AACGCGGGAG^GCGGAACrrrG .CyVGTGAgCrqS, AGGTTGCAGG^ACTGGACTGC AG£jETGGGGA, v 
ACAGAGTGAG ACTCCGTCTG^AAAAAAAAAA^AAAGAAAAGA AAAGAAATCA, ^CATQTCATTe^ . 
iLj AAGT0GTGGC ATTTAAAACT /ATTTAGCCTT TCTGTAGGCArAGGl^ACiTAT^CTTG^^^eW 

^ CAGACCTCAA GGTGTTTTTT 1 " TGTTTGTTTT TTCATACGGG* TGTGTGGTGT- GGGTGTGGGG* 

'jjj ACTAAAAGCT ACAAGCAAGA AATAATAACA ACTACAACAA TACTAATACC AATAGTATAA? 

13 AAATAATAGC ATCTGGCTAA^TTGCTGGACA CTGTTTTAAG TGGTTTGCAT GCCTCAGCTC ^ 

!«* ATTAACTCAT TT ACCTGTTA * TTATTGG CCC TATTTTAGAA ACAAGGAGCG AAGGCTCAGA* 

GCAGTTAACT AAGAGCCTGT CAAAAGAAAC TCTGCAGAGA TATTAAAT!FT- AAAAAATAAT& 
GAGAGAAATT - AAACCACAAG^AAAGTTGAAA TTTAGAGGTA CAGGjCAGGTA AGCTTGCTTG * • 
CTTTGAAACA GTGTCTGCTA CTGGGAAAAA GGCAAGTCTT GGCTTTCCTA ATAATTGATA 
CCAGGACTCT GTAATTCATA TTTTGCATGC ATGTAAGTAA GAAATGAAGC CGGGTGCAAT 
GGCACATGCC AGTAATCCCA GCACTCTGGG AGACTGAAGT GGGAAGATCA CTTGAGCTCA 
GGAGTTCAAG ACCAGCCTGG GCAACTAAAA ATTAAAAAAA TAAAAATACT AATTGTTTTT 
ATTTTAGTAG ATTTTATTCA TACCACTTAC ATCATTATTG TAGTATGTAC ATATTTATTT 

cnTi ' crrTT CTTTTCTTTT cttttttgag acggagtctc gctctgtcac ccaggctgga 

GTGCAATGGC ACCATATCAG CTCACTG CAG CATGCGCCTC CTGGGTTCAA GCATTTCTTC 
CACCTCAGCC TCCCAAGTAG CTGGGATAAC AGGCACCCAC CACCATGCCT GGCTATTTTT 
TTTTTTC CGT AGAGATGGGG TTCCACCATG TTGGCCAGGC TGGTCTTGAA CTCCTGACCT 
CCAGTGATCT GCCTGCCTCG GCCTCCCAAA TTGCTGGTAT TACAGGTGTG AGCCACCGTG 
CCCAGGTGGG AGATAGACAT TTCTCTCTAC CTCAAACAGA GGTCCACTCA AGCTACTTTT 
CATTTTCTTC ATAAATATTA GCCGAGTGGC TATTTTGCAC CAGGAATGGT TCCAGGTGCT 
GTGGATATGG CATCAGGCAA AACAGACCAA AAACTTCCTG CCGCGTGGAC CTCATGTTCC 
CCAAGTGGAA GACSCfea^T ^AAAGAGATiiG^ ATAAATATGT- -AGTAAAT-T^ AAAAflJUa^A^ 
AATTAGCGGG GTGTGGTGGC TTGCACCTGT AGTTCCAGCT ACTTGGGAGG- CTGAGGTGGG^ 
AGAATTGCTT GAGCCCAAAC GTTTGAGGCT GCGGTAAGGC ATGACTGGAC* TGCTGGAGTG*- 
CAGACAGCAG CCTGGGTGAC AAAGGAAGAC GTTTTTGTGA^GAAAGAAAAA*. AAAAAGAGAC^ 
GAAGGGAGGA AGGAGAGAGA AAGGAAGGAA GGAAGGAGAA' AGAAAGGAAG- GAAGGAGAAA*** 
GAAAGGAAGG- AAGGAAGGAG ^ AAAGAAAGGA AGAAAGAGAA ^ AGAAAGM^^AGAAAGAttAG*^ 
AAAGAAGAAA*GAAAAGAGAG^AGGAAGGtf^V^ 

GTTGAAGAGC AGTGAGTATT ATTATAGGAG GGTAATTATA GGGAGGTATG GGGAATTGAA 
GACAGGAAAC ACAAATTAGT CCAAGCGAAT GGATTTCTAT TGGGAGTGAT TCTGCCCCTA 
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GAAGACACTG GCAATACCAG GAGACATTTT TGGTTGTCAC AACTATATGG AGGGGCATTA 
CTGGCAACTA ATGGATAGAT GCCAAGTGTG CTGTTCAACA TGCTATGATG CACACGGCAG 
GCCTCCACAA CAAACCATTA TCCAGCTTCA GATGCCCACA GTGCCCAGAT CGAGGAACCC 
TCATC CAGGG GCTGAGAACC GTATTTTTGC AGAAGGGAGG TATAAGGATG GGTTGGTGGA 
GAATGGGGAA GGAAGGTGTG TGTCCAGTAA GAGAAATAAG GCCTGCACAG GCTGGAGGGG 
AGAGTGAGAG AGAAAGGGAG GCGGAGAGAT ACACGATGAG GGAGACAGGC TGGAACAGAA 
AGTAGAGACG AAGATTCGAG ATGTGGAGAG GAAGGGTCAC AGACCCCCCC GAAATGATGT 
GTGGACAACA GGAATCTGGA AGAGGAAGAT GGAGTGGAGA GTGACAAATG GGG TCTAAAG 
GTTGAACTTG GAGGCCAGGC ATGGTGGCTC ACGCCTGTAA TCCCAACACT TTGGAGGCTG 
AGGTGGGCGA ATCACTTGAG GCCAGGAGTT CGAGACCAGC CTGGCCAACA TGGTGAAACC 
CCGTCTCTAC AAAAAAAATA CAAAAAATTA GCCGGGTGTG GTGATGGACA CCTGTAGTCA 
CAGCTACTTG GGAGGCTGAG GCAGGAGAAT TGCTTGAACC CGGGAGATGG AGGCTGCAGT 
GAGCTGAGGT CAGGCCACTG CGCTCCAACC TGGGCAACAG AGTAAGACTC CATCTCAAAA 
AAAAAAAAGC TGGATTTGGA GTGAAATATT AATAACATTC TCCCTCTCTC TCCTTTTGCC 
TGTGTCTCCA TCTCTGTCTT TTTCTGCATT TCTTCATCTC TGTACTTTC C ATCTCTGTGT 
GTCTGTTCCC ATCTGCTTCT CCATCTATGG GCATCTCTGG GTCTCTCATG TCTCCTTCTG 
CCCACTTTGC CACATCTCTG CCTCTCTCAT GCCCCCCTTT CTCTCCTGCA GGGTGATTCT 
O GGGGGGCCTG TGGTCTGCAA TGGCTCCCTG CAGGGACTCG TGTCCTGGGG AGATTACCCT 

|»** TGTGCCCGGC CCAACAQACC GGGTGTCTAC ACGAACCTCT GCAAGTTCAC CAAGTGGATC 

111 CAGGAAACCA TCCAGGCCAA CTCCTGAGTC ATCCCAGGAC TCAGCACACC GGCATCCCCA 

Kj CCTGCTGCAG GGACAGCCCT GACACTCCTT TCAGACCCTC ATTCCTTCCC AGAGATGTTG 

hi AGAATGTTCA TCTCTCCAGC CCCTGACCCC ATGTCTCCTG GACTCAGGGT CTGCTTCCCC 

gj CACATTGGGC TGACCGTGTC TCTCTAGTTG AACCCTGGGA ACAATTTCCA AAACTGTCCA 

*yi GGGCGGGGGT TGCGTCTCAA TCTCCCTGGG GCACTTTCAT CCTCAAGCTC AGGGCCCATC 

~ CCTTCTCTGC AGCTCTGACC CAAATTTAGT CCCAGAAATA AACTGAGAAG 

Q 

O . 

a3 
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CTTGAACCCA GGAGGCAGAG GTTGCAGTGA . GCTGAGATCG CGCCACTGTA CTTCAGCCTG^ 
GGTGTCAGAG CAATACTCCG TTTTGGAAAA CAAACAAACA AACAAACAAA . CAAAAAACAG- 
ATGGAGCAAC TGAGAGAGGT CTTGTGACTT GCCCAAAGTC ACACACCTCA TCACTAATCA 
CACCTAATCA TTGAGATTTG GACACACATG GTTCAGTTCC AGAGTC CATG CTCCAAACCA 
TGACGACACA GTGAGAGAAC ATTCAAGGGG AGCCCAGACC CAGCTTCATA ACCAGGCCTG 
TGAGCAGGAG AAAGTGGAAG GGATCGTAAG TGCCCAGGGG AGGCAAAGAT GGACTCTGCC 
TGAGGATCTC AGAGATTTCC TGGAGGAGGG AGAATTGAGG TTGGGTGTTG AAGGATGAGT 
GGGAGTTCAC CAGGAAAAGA AGGATATGGA GAAAGACATT CACTCATTCA ATGAACATCT 
CCTGAGGACT TCTGCAAGCC CTGTTCCGCC TGGAACGGGG TGATGCTGGG ACACAGAGAT 
GAGTCAGACC TGGGCCCAGC CCTCCAGAAG CTGTCCACCT GGTGAGAAGG AATGATGAGG 
AG AG AGG CAG GGAGGATGGG GTGATGGAAG GGACAATGGG GTGGGGGGCA GGGAGATGGA 
TGAAAAAAAT ATATAGCAAA TGTTCTCAGG ATTTGGCAAA GATCAGGATG TATTAAGAGA 
GAGCACAGGG CACTTGCTAC CTGGAAGGTT GGGCACCTGG GTCCTTGGGT GGTGGAGCCG 
TGGGGAAGGG GGCAGGTTAT GACAAGAGTG GGTTAATCCA GATGGAACCA GATTTCTCAA 
CATTCTAGGA GAGGGCCTTG TCCTTGTGGG AAGAGGCCCA AATCCCCAGG GCAGGGAAGG 
TTCTGCAAGG TGTGTAAACC TGTGCAGCTG CCTGTGGTCT CTGCCTCACT CCACCTGGAT 
TTCCCTCAAT -CTTTCCeGTG TTCTGTGTCC l TCCTCCCAOP *CCTGGTCTCA W TCTTGGGT€C 
TTCTGTGCCT GTACCTCCCT CTCTTTGTAT CTTTTGCTCT TGTGTCTGAG .TCGTGACTCT 
GTCTTCCACG CCTCGCCTCG TTTCTGGGTG" GTCCCCCTGC AGATeceTCG^AGGCTGGGGT^ 
GGGAGGTTGG TCTCTGCAGA CCACTGGTTT ATCCAAAATA , AACCTGCTGG* ACCCCAGGACt * 
CTTAGGCTTC** AAGGATCTCC CTCCTTTTeC -AGGAGAGAAA\ AGATTGTGTA >TCTTGTAGGG ^ 
TAAGGTGATG AGGAATGAGG TCTCCCAGTC TGAAGAGCCC AGAGGAQGTG CCCAGAAGCX 
CTCCAGACCC CCAGGACTCC TCCTCCATTC AGTCAAGGTC^TGGCCGAG,GA**AGCGGCOAGT#r? 
TCATCGCAAA AGGGGGGTCC CCCTGCACTT ACCTCCTCTC' CCAAGGGGGC— TGTCACAGCC* 
CCAGGGCTTG CCCCTCCCCC A GGTACATTT CCCAACCCCG ATTA ATCACA GGGGCGGCCC^ 
CATGGAGGAQ GAAGGAGATQ GCATGGCTTA CCATAAAGAA GTACTGCACG CCGGGTGGAG^ - 
GTTCCAGGAT CCAGG TGCCC AGGGGTCATG AAGCTGGGAGtuTCCTGTGTGG TCTGGTCT^Tl 
CTGCTGGCAG GTGAGGCTCC CAGGCTGGCT GCCGCTTGAC GGeTGTAGTA-AGGTCACCTT- 
GCTCTTGCCT CCGATCGCAG .GCTTGTGCGT CCTGGGGTGT -AGGGTTGTGA gcatcgtctc** 
CCTGCCCTCC ra flgOTGCTC TTCQCTGACC CCTTTGTCGC T CATCCCCAC CCCAGGGCAT 

GGCTGGGCAG acacccgtgc catcggggcc gaggaatg tc gccccaactc ccagccttgg 

CAGGCCGGCC TCTTGCACCT TACTCGGCTC TTG TGTGGGG CGACCCTCAT CAGTGACCGC 
TGGCTGCTCA cagctgccca ctgcggcaag CC GTGAGTGA CCCAGGCTGG CCATGCTGGG 
GAGGGACAGA GGCTGGGGGT CAGGAGAGGG TGAGGGGTGC TTTAGGCCAG AAGTGCGGAG 
CCTCCACTTC TGATACCACA AGTTCAACTC TTAGAAGTAG GAAGGGTAGC CTCCCAAATC 
CTAAAATTCT AGAGACCAGC AATATCTCAT TTGAGAAGTC TAAGATTCGA AACTTAGGCT 
CTTCGAATCC GAGACTGACC CAGAGAAATC CAGAATCGTA GAATCCTAAA ATCTTGAATT 
TATGAAATTC TGCAATAGCC TCAGCAAATT TTAGAATCAT AGATTCGCAG ACTATTAGAA 
TCTTAGCAGT CTGGGTCAGC ACTGCCCAGA GGAATTATGA TGCCAGCCAC ATGTGTAAGT 
TTAAATTTCT GGTGGACACA TTTAAAAAAT AAGGAATGAG TAAAATTAAT TCTAATAGAT 
TTAACTTGAC ATACCCAAAA ACTTATTTTG ACATGTAATC AATTTTTAAA TACGTATGAA 
CGATACAGTT TACTTTTGTT TTGGTACTAA GCCTTTGAAA TCTGTTCTGT ATTTTACACA 
CATAGGCTGT TACAAAATGG-AGTAGGGACA TTTCAAGTGT' TCAATAGCCA^TAATGGGTAGW 
TGTGATCCTA GAATCTTAAA TTCAGAGCTT TCTAGATTCA TTGAATATTG AAACTCAGAG** 
TACTAGAATC TTTGATTCAC AGTATCCTAG AATATTGAGA TTCAGATAAT TCTGTAGZUCT** 
TAAACTATTT GAATCCCAGA CTCTTAAATT TCTAAGGTTA^TAGATTTATA GAATGATGAG** 
ATTCTAGTCT TTCTTTTTTT TTTTTTTTTT TTTTTTTGAG^ACAGAGTCTC CCTGTATGTC^ 
CCAGGG-TGGA*GTGCAGTGGG*AGftATGTCAG -CTCACTGGftA^CCTGTGG<^ l G^ i TGGGGTTGAAWr > 
GCAATTCTCC TGCCTCAGCC TCCTGAGTAG CTGGGATTAC AGGTATGCAC CACCATGCCA 
GGCTATTTTT T'l-TTT TT TTT TTTTTTTAGT AGAGACGGGG GTTTCACCAT ATTGGCCAGG 
CTGGTCTTGA ACTCCTGACC TTGTGATCTG CCCGCCTCGG CCTCCCAAAG TGCTGGGATT 
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ACAGGCGTGA GCCACCGCGC CCAGCCAAAA TTCTAGTCTT TTTGTCCTAG AACATTAAAA 
TTCTATGTTC AAATCTTAGA TTTAATTCAG ATAATGTTAG AATCCTGGAG TTTTTTTGAT 
CCAGGGGAAT CTGGAATGTT AGAATCTTGG ATTCATAAAA CTCTAAACCT TGAGCCTCTA 
GATTCTAGAA TCATGGATAA TAGTGTGTCG GAATCTGAGA ATTCTAGAAT CTTAGGTTCT 
GGGCATTCTA ATAGTATCCT GGAATCCACC TGATGCAGGA ATCCTCTCTC CATTGCCTCT 
GAAAAGTGAC CATCCATACT GTTCCAATTT TCTTCCCTCC ATGAGTAAAG CACTGATTGT 
GGTAAGAGAT GCTGTGTGGG AATTTCCCAT CATGCATTGC TCCATGATGG AACCTCCTTT 
AACTTAAGCC TATACATCAG ACTGGGAGAA CGATGTTCAG ATTTCAGCCG AAAGTGAAGC 
AGGAGAAATG CAGAGATATG AAGGTGGAAG AGAGTGAGAG GCAGGGGAAG GGTAGGGGGA 
TGAAGGGATG TAGGGGTGAG GACTACTTTT CCAGATCCAG AGCCAAGACA GCAAGAATGA 
CAGAGAGAGA CAGACACAGA TGTTTCTGGT TCCCCAACCC TGAATTCGCA GTCATTAGCC 
TGCTGCCTAA TGTCAGAGGT CAGAGGCTGG GGAATGGACT TGTCATCCCC GAAAGGATCC 
CAGCTGTCTA GGGCATGGAC CAGAAATGAA ACAAGTGCGC TGAGACTGTG GTGAGGGCTT 
AAGGTTAGAC ACCAGGAAGA CATGCATTGA AGGGTGAAGG ATATGATAGA CAGGAAAAGC 
TGAGGCCAGA GATGACCCCC AATTTGGGGA TTT T C CAT AT* CCCATCCCCT TTCATACACA 
CGCACACGTA TACACACACA CCACTTAGAC ATACAGAGCC GCTCCCACAG AAGCCACCAG 
ACCTGTGGGG GCAGGGGTGG GGCGGTTGTT ATGTGGTAGG TGGGGTCCCC CGTGCCCACA 
CCGTTCCTAG GGACCCAAGT CACCACCAAG GCTCCAGGTG AGTAGGGAGG AAGGTGGCTC 
ACTCAGCCTG GGACTAGGAG CGGGGGCTTT GTGGGGAGAG CTACAAAGAT GGAGACACAC 
fTt AAAACATCAG AGTGGGGACC AGGGACCCAG AGGAGGTGTG TGCCTCGCTT AAAATCACAG 
Q TACCCTGGGC CAGACATAGA TGATGAGGGT GCAGAGAGGG TGTGTGGCTT GCA GAGG GTC 
M ACACAGCACC CTGATGGACA GGAAAAGAGG GCTGGGGCTG AAAGGACTTT TACCTTTCCC 
jlJ CCAG CTTGAC CTCTGAGGCC TGTCCCAGCA G GTATCTGTG GGTCC GCCTT GGAGAGCACC 
\\ ACCTCTGGAA ATGGGAGGGT CCGGAGCAGC TGTTCCGGGT TACGG ACTTC TTCCCCCACC 
y CTGGCTTCAA CAAGGACCTC AGCGCCAATG ACCACAATGA TGACATCATQ CTGATCCGCC 
|Vj TGCCCAGGCA GGCACGTCTG AGTCCTGCTG TGCAGCCCCT CAACC TCAGC CAGACCTGTG 
jS TCTCCCCAGG CATGCAGTGT CTCATCTCAG GCTGGGGGGC CGTGT CCAGC CCCAAGGGTA 
TGACCTGGCC CAGAACTCTC TCTGAAACTT GCTCCCTCAC CCCTCTGTCT CTGCCTTTTC 
ATCTCTGTCT TCTCCTTTTC TCTCTCCTCT CTCTCTCTGT CAGTCTATCT ATCTGCCAAT 
% CGATATATTT AACCAAATAT AAGATGCTAG CATTTTTAAG ATGTGCCATT ATTTCATGAA 
4* CTGCGAAGAA GTGGAAGAAG GAGGAGGAGG AGAAGAAAAA AAGGAGGAGG AGGAAAGATC 
Q CCATTAGATC CCATTGATTA TATAACACCA TTTTCTGGAA GACACATTCT AATTTCAGAG 
U» TGTTTGTTTG TTTGTTTGTT TGTTTGTTTT TGAGACAGGG TCTCGCTTTG TTGCTCAGGC 
\Q TGGAGTGCAG CGGTGTGATC ACGGCTCATT GCAGCTTTGA ACTCCTGGGC TCAAGTGATC 
tfj CTCTCGCCTC AACCTCCCAA GTAGCTGGGA TTACAGATAT GCACCACCAC ATCCCACACC 
GGGGTCATTT TTTTATTATT TATTATTATT ATTATTATTA TCTTTTTTTT TGTATTTTTA 
GTAGAGACAG AGGTTTCACC ATATTGGCCA GGCTGGTCTC AAATTCCTGA CCTGGTGATC 
TGCCCGCCTT GGACTCCCAA AGTGCTGGGA AAACAGGCAT GAGCCACTGC ACCCAGCCAA 
AATTCTAGTC TTTTTTAAAT CTAGTCATAT C TT AGATTTA ATTCAGATAA TGTTAGAATC 
CTGGAGTTTT TTGATCCAGG GGAATCTGGA ATGTTAGAAT CTTGGATTCA TAAAACTCTA 
AACGTTGAGC CTCTAGATTC TAGAATCATG GATACTAGTG TGTCAGAATC TGAGAATTCT 
AGAATCTTAG ATTCTGGGCA TTCTAATAGT ATCCTGGAAT CCACCTGATG CAGGAATCCT 
CTCTCCATTG CCTCTGAAAA GTGACCATCC ATACTGTTCC AATTTTCTTC CCTCCATGAA 
TAAAGCACTG ATTCTGGTAA AAGATGCTGG GTGGGAATTT CCCATCATGC ATTGCTCCAT 
GATGGGACCT CCTTTAACTT AAGCCTTATG CTAAAAATTT TTATTATTTT TAGCAAAGAT 
GAGGTCTTGC TATGTTGTCC AGGCTAGTCT CAAACTCCTG GCCTCCCAAA GTGCTGAGAT 
TACAAGTGTG AGCCACTGTA CCTGGCCCAG AGATGTTTAA ATGTGAAATG CGTTCATCTT 
AGAATGGGAA TAAGACCATG TCTCTCAGAG TCACGGATCA CTGACCCATT AGCCAAATTG 
GGTCAGTGGA TTGGAAAAAC AGTCTGAATT TGTTGCTGCC AATATCTAAA ACTTGGAAAG 
TTTTATACAA AAGCCAGGTT TCTGGATTCA CCTGAAAAAG TTTGAAGAAC TCACATTCCC 
AAAATAGCAA GCATTGGGCT GAGTCAATGG AGGCTGCCCC CTTCAGCCAA GATAAGTTCT 
CTGATTCACT CCAATGGACC CAAATGGCTC CTGTCTCCCT GCACAGCCCC CGTCCCCGAC 
TTCTGTTTAC CAATTCTGTT TATCATATCC CTTGATGCAT CGGAGCCTGC ACCCATGTCT 
TATATAGATG CACATGTGTA TTATATATCC ATATCCACAT CTATACTGAC TACACTGTAT 
CTGGTATCTC TGTCTATGTC TCTGTCTCCA TCAGTGACCA TCTTCCTGCA AATCTCTTTC 
CTTTTATCTC ACTGC CTTCA TTCCACCCCT TGAGGTCTGG GTCTTTTTCT ATTTCTTTTT 
TTTTTTTTTT TAAGAGACTG AGTCTTGCTC TTGTTGCCCA GGCTGGAGTG CAGTGGTGTG 
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ATCTCGGCTC- 
CCGAGTAGGT 
TAGAGACGGA 
CCCGCCTCAG 
TCTCTGCATG 
TTTTTTTTTT 
AGTCTCGGGT 
CCGAGTAGCT 
AGAGATGGGG 
GCCGGCCTCG 
TCTCTCTCTA 
TGTCTGTCTG 
TCCTTTGGAA 
GTTGCAGATC 
GCCAACATCA 



ACTGCA^CCT 
GGGA6TAGAG 
GTTTCACCAT 
CCTCCCAAAG 
CATCTTTCTG 
TTTTTTGAGA 
CACTGCAAGG 
GGGATTACAG 
TTTCACCATG 
GCCTCCAAAA 
CTTGGCCCTC 
TCACTGTCTC 
AAGCTGCAGG 
CTTGACAGTT 
G CAT CCTGGA 



CCACCTCCTG 
GTGTGGAACA 
GTTGGCCAGG 
TGCTAGGGAG 
TTTCTCTTCC 
CGGAGTCTTG 
TCCACCTCCC 
GCGCCTGCCA 
TTGGCTGGGC 
CACTGGGATT 
TTCCTCCTTG 
TTGTCT CTAT 
GAGGACTCAG 
CTCTTCCCTG 
GAACAAACTC 



GGTTTTAAGTU 

GCATGCCCAG 

ATGGTCTCAA 

TTATATATGC 

TTCGTTTCTT 

CTCTGTCTCC 

AGGTTCAAGT 

CCATGCCTGG 

TGGTCTCAAA 

ACAGGCATGA 

TCTCCATTTG 

CTTTGAGAGTV 

GGCAGTGGGG 

ACAGCGCTGT 



t GATCCTCCTG 
CTGATTTTTT 
TCTCTTGAGG" 
ATCTCCTGTT-' 

CAGGGTGGAG 

GATTCTCGTG? 

CTAATTTTTG 

CTCCTGACCT 

GCCACGGTGC 

TTTCTCTTGT 

CCTAAATGTG 

TGCTGAGTGT 

TTCCAGTCAC 



CCTCAGCCTC 

GTATTTTGAG 

TTGTGATCGG > 

ATCTCTTGGC 

TTTTTTTTTT^ 

TGCftGTGAGC 

CCTCAGCGTC 

TATATTTAGC 

CAAGCGATCC 

CCGGCCAGCC 

GTGCTATGAC 

GCTCCATTGG 

GTTGGAGACA 

ACTGCAQTGT 



TGTCACTGGG CATACCCTGG ACACATCTCG 



GACAGCATGC TCTGTGCGGG CCTGTGGGAG GGGGGCCGAG GTTCCTGCCA GGTGAGACCT 
TACTCTGGGG AAAATGAGGC TGTCCTGCCA AGTTTTCTAG GATTTAGGGG AGCAGAGGGG 
TCGGCCCCCA GCCTTCCTGG GTCAAAATGA GAAGGAGACT GGGATACCTG GTTCCTGGGA 
GAGGACGGGA CCAGGGCCTG GACTCCTTAG TGTAAAAGAG AAAAGGTCTG GAGGTCCAGA 
CTTCTGGATC TACAGGAGGA GTGGGCTGGG CGTGCAGAGT CTGAGTCCTC GGGGAGGAGG 
AGGTTAGGTC CTGSGGGGAG GTGGGCCCTC TGAGCTTTTA CTCGTGGGTC TG AGG AAGAA 
GAGGCTGGAG* ATGGAGGACT CTGGGATGTT GGAGGAGGAA GGGGCTGGGG JCCTTTGTGGG 
AGGGAGGAAG TGGGGGGTGT AATTGTCATG AAGAGAGTGGmGCTAAGAGTT ' CGST^PGGGGT^ 
TCTCTCGCGT AC AGGGTGAC TCTOGGGQCC CCCTGGTTTQ CAATG GAACC TTGGCAGGgG^, 
TGGTGTCTGG GGGTGCTGAG CCGTGCTCCA GACCCCGGCQ -atCCCCQCA GTe^T ACAGCAGCGf|, 
TATGGCACTA CCTTGACTGG ATCGAAGAAA TCATGQAGAA^^G^^TC^^ ^g^^MGg^ 
GGCACCTTGG AAGACGAAGAxGAGGCGGAAG GGCAGGGGGTirAGGQG^ 

CAGCCTCAAT GGTTCCGGCC ; CTGGACCTCC AGCTGCC?GTG»AC^G<^GT^^GT^GGA CAC^AAG 
ACTCCGCCGC rTGAGGCTCCGrCCCCGTCACG AGGTCAAGCAtAQACACAGTC GCGCCCCCTCg 
GGAACGGAGC AGGGACACGC CCTTCAGAGC* CCOTCTCTAT GAC GTCACCG ACAQCCATGA*!^ 
CCTCCTTGTT GGAACAGCAC AGCCTQTGGC TCCQCGCCAA GGAACCACTT ACACAAAATA«E& 
GCTCCGCCCC TCGGAACTTT GCGCAGTGGG ACTTCCCCTC GGGAGTGGAC^CCCTTGTGGG ' 
CCCGCCTCCT TCACCAGAGAlfCTCGCCCCT CGTGATGTCA GGGGCGCAG T AGCTCCGCCC 
ACGTGGAGCT CGGGCGGTGT AGAGCTCAGC CCCTTGTGGC CCCGTCCTGG GCGTGTGCTG 
GGTTTGAATC CTGGCGGAGA CCTGGGGGGA AATTGAGGGA GGGTCTG GAT ACCTTTAGAG 
CCAATGCAAC GGATGATTTT TCAGTAAACG CGGGAAACCT GA 
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ATTAAGAAGG ACC CAG AC AT ACAACCTCTA AATTCTGAGG GTCATCCAGT AGAATATTCC 
AT AT ATG TAT ATATGAAATA TCCTATATCT GTGCTGTCCA ATTATCCACT AGCCCCTTCA 
GGCTATTGAA CATTTGAAAT ATGGCTGGTG TGACTTAAGA ACTGAATTTT TAATTTAGTT 
TTACTTCATT TTAATTAGTT TAAATTTAAA TAG CCACATG TAGCTAGTGG CTACCATATT 
AAACAACATA GGTCTGGAGA AAGGACTGTG CAGAGAGAGG AAATAGCAAG TATAAAATGT 
CTAGTATGGG GGCATCCAAG ATGATTTAAA TTCTTCTTTT CTTTAAATGC CTGGTGTGTT 
TGAAGAACAG GCC CATGAGG CTGGACTAGA GGAAGTCAGA AGAAAGAGGT TGGAGATGGG 
GTCAAAGAGG CTGGCAAGGG CGAGACAGCA CAGAGTCCTG CACAC CTTGG GAAGGCTTTT 
TGGATTTTAT TTTAAAGAAA GTTGAGCCTG GGAACAACAT CTGACTTTCT TTGTTTGAAG 
AGTCCTCAGC CTACTTTGAG AAGACTGGAT CGGAGGGATG TAAAAGTGGA AGGATTTAGG 
TTAATGTTGT AGTCATTTGG GCTACAGAAG ATGGGGCATG GACCAAGATG GTGGCAGAAG 
TGTGGAGATA ACTGGATATT TGGGAGATAA AACCAATAGG AACTGGTTGT GAGTGATGAA 
GGAAAGAAGA GAAGCAAAGA TGACTCCCAG GTTTGGGGCT GAGCACTGAG GTGGGAAATA 
CTGGAGCGAA CAGTTTTGAT TGAGAAGAAT CAAGTTGGGA ATACAAAGCT TAAGATGCCT 
GTAAGGCATC CAAATCAACA GTGTTTGAGT TTTGAGCTTA AAGAAGAGTT CAGGGCTGGA 
GATGATTAGC CTATAGCTGG TATTTAAAGC CATGGAGGCA ACCAGTATAT ATGCAGTGAA 
AGGATAGAGA GATGGGTGGA AAGATGATTG GATGGATGCA TGGATGGATA TATGGATAGA 
TGGATGGATG GATGGTTGGA TTGGATGGAT GGATGGATGG ATGGATGGAT GGATGGATGG 
ATGGATGGAT GAATAAATGG ACCAGTGGAT GGAGGGACAG ATGAGTGGAT GGATGGTTGG 
ATGGATGGAT GGATGGATGG ATGGATAGAT GGTTAGATGA CTAC CTAAAT GGATGAATGG 
ATAGATGGAT GAGTAGACGG ATGGACAAAT CAATAGGATG AATGGGGGAT GGATGATTGG 
ATAGATTGAT GGATAGATAT TGCCTAGGTG GATGTGTAGG TCAGTCTCAC -TTCTACCTCC 
fjj TGAAATCCAT CTTCTGGTAG AATGATATAA AAAATGCATG TGGAGAGAAA GTCAGGCTCC 

iQ TGCTTACCTA TCAGCAACAT CCTCATTTTG TGAACTCTTC TGTTAACCCC CAGTGGAGGA 

\T\ TTTGGTACTT CCTGAGAAAA TAATGTCACC CCTTTGCCCT AATTCATCTC CACTTGGTCA 

j, AGAATAGCAA CTGCCATAGG TCGGCAAATT CATCTTCAGT TCCTGGTCAC CCAGGGCAAT 

|«| AATCCGACCC TTACCCCAAA CCCAGAAACC ACAACCCCAG GGCTCCTCTG CCCCCTGGAT 

*P CCCAGTTTTC TAACAATCTC TCTTCTTTAC CAGGTGTCTC CCAGGAGTCT TCCAAGGTTC 

TCAACACCAA TGGGACCAGT GGGTTTCTCC CAGGTGGCTA CACCTGCTTC CCCCACTCTC 
M AGCCCTGGCA GGCTGCCCTA CTAGTGCAAQ GGCGGCTACT CTGTGGGGGA GTCC TGGTCC * 

I** ACCCCAAATG GGTCCTCACT GCCGCACACT GTCTAAAGGA G TATGTGGGG GCCGGGGGAG 

'H CATGGGGTAG GGATGAGAAT GGGACTGGGA TTGTGGATGG GGTAGAGTTG GATTTGAGGA 

*S TGGAGTTGGA GTTAGGGTTG GGGATGGACA TGGGAGTGAG AATGAGGTTT GGGGTTGAGA 

TATGGGGATT GGGTATGGGA ATAGAATCAA AGTAGGGGAT TTGGATGGGA TTGAAGTTGA 
GGATGGGGGA GATGTATTTG GAGATGAGGA AGGTAGGATG GAGAAGAAGT TAGGTTGGGG 
ATGGGAAGAG GTTGGGGCTG GGATGGGGAT GGAAATGGGC TCATCTTCTT TCCTAACCAC 
CTTCTTTCTG CACCCACA GG GGGCTCAAAG TTTACCTAGG CAAGCACGCC CTAGGGCGTG 
TGGAAGCTGG TGAGCAGGTG AGGGAAGTTG TCCACTCTAT CCCCCACCCT GAATACCGGA 
GAAGCCCCA CCCACCTGAA CCACGACCAT GACATCATGC TTCTGGAGCT GCAGTCCCCQ 
GTCCAGCTCA CAGGCTACAT CCAAACCCTG CCCCTTTCCC ACAACAACCG CCTAACCCCT 
GGCACCACCT GTCGGGTGTC TGGCTGGGGC ACCACCACCA GCCCCCAGGG TATGCACCCA 
CACAGGTGGC CTGAGGCCCC ATAGGAGTGG CTGGGGAAAC AGGGGCAGAG ATGGGAGGGA 
AGGTCTGAGG 

TAGGTTCCTT TATATATAAA AATATAAATA AGTAAATAAA TATATATATT TAAAGTTAGC 
TGTATCCTTT ATATAAATAT AAATTCATGA ATATATAAAA ATATGAGTAT ATAAATTCAT 
GAATATATAG AAATATAAAT AGATCTAATA TATGAATATA TTATATGATG TATATTATGT 
ATTATATAGT AATATAATTA TATATTATAC AAAAAGTATA CAAATTAAAT GTATTTTATA 
AATTATAAAA TTTATCAATT ATGTATTTTA AATATGTATT TCTG CAT AAT GTATATATTA 
TATATAATCT ATATTTAAAT TATATATTAT AAATGTATTT TATAAATGTA TACATTTATA 
TATTTATATA CTG TAAATGA ATTTTATCAT TTATAATATA TAAATCATAC ATATAAAATG 
TTTATATTTC TAT AAT TT AT AAAATGTTTA ATATATTAAA TATGGTTATT AATGAAATGT 
CTAATAATTC AATGTAATAA TTAATTCTAT ATCATTACTT AGTAAGTATA ATACATTATA 
T ATG TG AAT A TAAAGTTGAT GTATATACCG ACAAGAGCCC TTTGCATCTC CCTAGCAATC 
CCTGACTCTC TCCCAGCCTC ATGTTTGTAT CTTTCTCCTC AACATGCCCT GTCTCTCTTC 
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CTACCATECT ATCGAACTGT -CCGGTAACTC TTGCeATCCC-TGTTCCTGCT TTTCGCATCT*?*^ 
TTAATTCTCT ATTTCTGAGG ATCTCCCTAT TCCAACTCCG^ TCTGTCGAAC TTTCTGTCGG* v 
CACCGCTGGC TCCACCACTC - TCGTTATCAA CCTTGCATTC TCTTGTCCGT TCGCTCGTTG ,, - 
TCCTTCCCTC CACTTTTCTC CTCATCTCTC CCTTCGCCTC TCTCCCATGT GCCTCCATAT 
TTCTGTCACT TCCGTTGCTT TACCCAGATA GGTGCTCATC TCTTCTCOCA TCTTTCTCTT 
CCCATCTCAA TTTTCTATGT ACTCTTTACC CATTCAACTC'* GCCTATTTGA CCTTCATCGC 
ATATCCTATC CAGGTCGGAT ACCTTAGACC TTCTCTTTGT TCTCCCGA GT GAATTACCCC 
AAAACTCTAC AATGTGCCAA CATCCAACTT CGCTCAGATG AGGAGTGTCG TCAAGTCTAC 
CCAGGAAAGA TCACTGACAA CATGTTGTGT GCCGGCACAA AAGAGGGTGG CAAAGACTCC 
TGTGAG GTGA GGCCGGGAGG CTGGTGGGTG CCTTGGACAG GATAGAAAGC CAGAATGGAA 
GTGACAGATG CTGGGGAAAA AGCTTTGTTT CCAGCCTJTAG GGGAACCAAT CTTTATAAGA 
TACAATGTCC CCTCACATAG GAGGTCAAGA CAAAAAGGGG TACCCAGGGA TGGCAGGAAT 
AATTCATCAT AAGCCCCAGC TTTGACTGAG TGGCTGCCAA GATCCCTGTG TTGAGATGCA 
TAAAGGTTGG TATTCTTTCA CTTGTGAGTG ATAGACAACC AACTCAAACT GGCTTAAACA. 
AAATGCAGGC TTTTGTAACT GAAAATCCAG GTTGTCTGGC TTTAGGCACA GATGGATCCA 
GGTATGCAAA TTGTGTGTTT GGAATTCTGT CTTTCTTTTA ACTCTCAGCT CTTCTTTATT 
CTGTTTTGGC TTCATTCTCG GTTAGATTCT TCCCATGACA AGATGGCCCC AGCAGCTTTG 
AG CTT ACATC CTACCCTCTA GGCAACCCTA TTAGAAAGAG AACCTCTCTT TTCCAATAGT 
p\ TCACACAAAA GTCTTAAGCA TGATTCTCAC TAGGCTGACC TAAGT CATGT GTCTTGAGCC 

ATCACTGCAG^CAGAGCTGTG GGATTCTGTG" ATGGGCGAAG^ CCTGAGTGACv ATAGTTiiA'GT ■*"'' 
j s J GTGGGTGGTC-GAGAGGGGCA GGGACAAACT GCATGGATTG GAAGTGGAGA^^AGGGGAGTTe- 

|JJ CCCAAATGAA AAAATGAGGA GAGGCTGTTA CGAAAATAAG GGGAAft&GGGt* CAAGTAGAGT^ 

AGTTCATGCC TGTAATGCCA GCACTTTGGG- AGGCTGAGGT GAGAGGATTA CTTGAGGGGA*" 
IP GGAGTTTGAG ACCAGGCTGG GCAACATAGT IGAGACTGTGT^CTCTACAAA^ AGAAAAAAAAl „, 

n GTTTTTAAAT TAG£CAGQT£^TGG3£GAGT^ ^ 

|;J AGGCAGAAGG ACTATTTGAft^CCGAGGAGTT* CAAGGGTGCSA * GTGAGGTM<3^ATClATGCGAC^t 

TGCACTCC^G^CCTGGGTGAT^AGAGGAAGGC-'CCTGTCTCT 

AGCAAGAGAC TGTCTCTAAT^ATiATAAATTVA" ATAAAAATT-T *AAAAATX£AAT V ?,GTTTAATTTT ^ 
TTAAAAATAA GAGGAAATGG ATACTACATG AGCAAAAAAT AGCGTTCATG*AATAAAGAAG £ 
TTGAGATTGGr ATTCAGTGAG" AAAGAGTATG ATACTATATT AATGATATCT^ GCGTTGATGG^ 
.... ATTAGTGATG TCTGCCTTGG GCCCAGGAAG AGAAATAGAC TTACACGTGT GTTGCATACG* 
*0 CTGCCCAGASV ^TGAATgGGT TCACTggAATA . GTGAGAGACA CAAATGAGCG - TTAAATAGGA 

GCAGGGTCAG CTGGTGTGGG GCAGGGGGTG ATTTAGTACC AGGGAAACAA AAATGGGTAT 
GAAGTAAGTT GTTACCATTT TAATGAAACT GAGGAACAGA GAAAAACACA GAAATTTCTC 
TGTGTCTCTC TTTCTCTGGG CCTATCTCTG TCTTTCTGTC CCTATTTCTG TCTCTTGCTG 
TCTGTCCCTC TGTGTTTGTC TTCTTGTCTG TTTCTCACTG TCTTCATTGC TTTCTCTCAC 
ACTGTGTGTG TCTGACTCTG CCTCTCTGAG TCTCCTTCTC TGTGTGTGTC TCTCTCCATC 
TTTCACTCTC TCCCCACACC TCCCTGTCCC TGCCTTGTTT AGCCCCAGCA AGGACCCACC 
TCTCTCTCTC TTTCTTTCCC CAACTCA GGG TGACTCTGGG GGCCCCCTGG TCTGTAACAG 
AACACTGTAT GG CATCGTCT CCTGGGGAGA CTTCCCATGT GGGCAACCTG ACCGGCCTGG 

TGTCTACACC CGTGTCTCAA gatacgtcct gtggatccgt gaaacaatcc GAAAATATGA 
AACCCAGCAG CAAAAATGGT TGAAGGGCCC ACAA7AA 
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GTCATATTACATGAGGGCTCTGCTAGACTCCGAAAAACAAAAAACAGCAC 
AAAGTTCCCTTGTCCTGTGACTCATTCTCTCTCTCTCTTTCTACCATTTC 
TCCTTCCCTGTGTCT1 1 1 1 11 1 1 rCTCTCTGTGGGTTTTATTTAAGCAAT 
AGAAGTTCTTAGCAAAGAAAAACTTTATGGAATTAGATTGATCCACTTCA 
TATGTACATATATGAACTCAGTTCAGAAACTCTCTTCTACCCCTGCCTGA 
TCACCTATTTGGAAGTCTGTTCCTTCAACTCTTCTTCTCTTTCTGGGACT 
CTTTCTAGCTTGGGCTTCCTGCCCCTCCCGTCCACTCTCCTGCTTTCACA 
GCCTCTCCTTCCCCCTGCCCCTCCCCTGCACTGCATGGGGATGGGCCCCA 
GGTGTCCAAGGTCTCCCCACCCTCCTTTGTCACTGGAGTCAGGATTAGAA 
CCCAGCTCCCTAGTCACCTTGAGTCATCAGTCCTGGGGCTGCTGACGGGC 
TTGCAGAGGAGAGAGGGAGTGGGGCTGGGTCTTCCCACCCTGGGTCCTTT 
CCTCCTTCCCCACTCCGTTTAGCTGTAAAGCTCAATTAAGTGTGATTAGC 
TGAGAAGAGTTTCTGCAGAATTAGAGCACGCCCCACCCCTGTCTTCGTGG " - ■ 
m TCCCCTTCCCTTAACCCGGAAACTGGATGGGCCAGGACAAAGAGAGTTAA 
i'-j GAGCTTTGTCAGTGGTCTGTCTGGAGCGACAGATGGAAGGAAAGGGACCG 
\1 GTTGAGCAACATGACAGGTGGCTGAGGAGCCAGGTGCAGAGTGGTAGAGT 
fU TGGCTGGCGGAGTGGCCAGCACATGAGAAGACAGGCAGGTAGGTGGACGG 
M AGAGATAGCAGCGACGAGGACAGGCCAAACAGTGACAGCCACGTAGAGGA 
W TCTGGCAGACAAAGAGACAAGGTGAGAAGGAGGTAGGCGACTGCCAATGA 
® GGGAGTGACACACAGGGGAGCAGGTAGAGAGAGGACAAGCAGGTCATCCC 
m CTTGGTGACCTTCAAAGAGAAGCAGAGAGGGCAGAGGTGGGGGGCACAGG 
!L GAAAGGGTGACCTCTGAGATTCCCCTTTTCCCCCAGACTTrGGAAGTGAC 
''i CCACCATGGGGCTCAGCATCTTTTTGCTCCTGTGTGTTCTTGGTGAGTTC 
h TCCCGGAGCAGGGAGAGGGCAGGACTGCGACTGGATCCCTTCACCCCCAT 
£ GAGGAGGCCCCACCACCCTCCCCATCTCAGCTCTGGCCCCCAGCCTGGTG 
.f) GTGAGGAGGAGAGGGGCTTTCTCTGTGCCTCCATTTACCTGCAGCTCTCA 
GGGTACTGCTCACCTCGGTCTCCCCTATTTTTTGATCCCTCTTCCCTTCT 
GTCCCTCTCTGAATCTCTGTCTCTCCATTTCCCTCCTATGTGTAAGCATC 
TTTCTCCCTGGGTGTCTTTGATGTTTCATGGTCTTTTTCTATCACTGGGT 
CTCTCTCTCTTTCTCTCTCTTTCTCGTCTCTCTTTCTCCTCTCTCTCTCC 
TGCCTGTTTCTCTCTGTCACTCTGTGTGTCTCTCCATCTCTGTATC1 1 11 
rTTrrTrTr^rTnAC.r.C.ATOC.C.C.CTG,TCTGTCTCCA GGGCTCAGCCAGGC 1588-1747 
AGCCACACCGAAGATTTTCAATGGCACTGAGTGTGGGCGTAACTCACAGC <i) 
CGTGGCAGGTGGGGCTGTTTGAGGGCACCAGCCTGCGCTGCGGGGGTGTC 
CTTATTGACCACAGGTGGGTCCTCACAGCGGCTCACTGCAGCGGCAGG TA 
AGTCCCTTCCTGGGGTGGGCGAAGGGAGGACTATGGGAAGGCAAGCGCTG 
GGGGTAGGATCACAAGGGAGGGTGGTGCCCACTGGGAAGAAGCTGATCCT 
GCAACAAGAGAGTCTGAGGTTAGACCAGGAGTGGAACTTCCTTAGCAGTG 
GGCCTGGGGTGGTGCTGGGCAGGGTGAGGTATGTTGGGTGGAGGGCCGGG 
GAGGGTCCTGGAACCTGCCCTCCTGCCTCTCCCATTCCTGCATGTACCCT 
TTCTTTCCTATATGACATCTGCCACTCACCCCAGCCATT CCTTGACCCAG 
TCTGGGGCCGGGGCCCAGGTCTCACCCAAGCTCTTTTTCTTTTTCTTTTT 
TTTATTTTTTTGAGACAGGGTCTCGCTCTGTCGCCCAGGCTGCTGTGCAA 
TGGCGTGATCACAGCTCACTGCTGTCTCTGCCTCCCAGGTTCAAGTGATT 
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CTCCTGCCCCAGCCTCCTGAGTAGCTGGGATTACAGGCACCCGCCACCAT 

GCCCAGCTAATTTTTGTATTTTTTGTAGAGACAGGGTTTTGCCATGTTGG 

CCAGG€TGGTeTeGAACTeeTGGCCTCAAATGAC€TG<S®e©3(eTTGG€iSF 

CCCAAAGTGCTGGGATTACAGGTGTGAGCCACTGGAGCCGGGCAAGAXGA 

CCCAAACTCTTTGTGCAACTTCAGAATCTATGCCTGGG AGCTCTCTGGG C 

CTCAGTAGACTGATGTTCTGGAATTTTTTTCTTTTTCTTTCTTTT^ 

TTTTTTGGAGACAGAGTCTTGCTCTTTCTGTCATCCAAGGTGGAGTGC^G 

TGATGCTATCTTGGCTCACTACAGCCTCAACCACCTGGGCTCAAGTGATC 

CTCACACCTCAGCCTCCCAAGGAGCTAAGACTACAGG CCTG CGCCACCAC 

ACCTGGCTAATTTTTAAATTTTTTTTGTAGAGACAGGGTTTTGCTATGTT 

ACCCAGGCTGGTCTCAAACTCCTCAGCTCAAGCAATCTTCCTGCCTTGAC 

CTCCCAAAGTGCTGGGATTACAGGCATGAGCCACTGTGCCTGGCCTGGAA 

CTTTTTTTGTGAAAGGGGAGATCAGATGCAAAGAAACAGAGACTCAGGGA 

GAGAGAGGGCCAGCAGCAGGATGCAGAGAGGCCATTCATCAACCCACTCG _ 

TTCAATCATGAACCGACTCGTCCACGCATGAGCATGGAGGGCACATGCTC 

CGTGCCAGGCGGTGGGAATAAGGCAGTGAACAAGGTCCACTGATGTCCCT 

GCCTTCATGGGCTTCACCAGCCGAGAGAATCAGAAAGAGAGGCCTGGCGC 

GGTGGCTCACACCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGA 

TCACTTGAGGTCAGGAGTTTGAGACCAGCCTGACAeACATGGTGXAACCT 

TATCTCTACTAAAAATACAAAAATTAGCTGGGGATGGTGGCATGGTTCTG 

TAATCCCAGCTACTTGGGAGGCTGAGGCAGGTGAATTGCTTGAACCTGGG 

AGGTG@AGGTTGmGTGAGCeAAGA^GGTG€CA^GC^<^e€AG<|^GGG^ 

CGAGAGAGGGAGA@fTCGGTGOGAsAAAA-AAAAA\AAA$sAAAAAA^ 

GAGAGAGAGAGAGAT0©AGGGAGATGGTAGGAGAAACAGGGAAC^GGCAAw 

GATGGAAAGAGGGTGATGGAGGTTGGGAATAAGAQCCTGTAAGAGAGACt " 

CGGAGAATGAGAGTTGGGGGTGAGAGGAGAGACAGTGAGGGGG'AGAACAG 

TGGGGAGCGGCAGGAGCGCCTGAGTGTGCGTGGAGGGGTGGAAG©TGGGG»i- 

GACTGCGTGCCTGCGACGCGCrCAGGCGTCGCCACCGGGAGCAGGJ^CTG. 3592-3851 

fiGTGCGCCTGGGGGAACACAGCCTCAGCCACCTCGACT GGACCGAGCAGA g> 

TCCGGCACAGCGGCTTCTCTGTGAGrCATCGCGGCTAC rTCrtGAGCCTCG 

ACGAGCCACGAGCACGAGCTCCGGGTGCTGCGG CTGCGCCTGCCCGTCCG 

CGTAACCAOrAGCGTTCAACrCCTGrrCCTGCCCAATGACTGTGCAACCG 

CTGGCACCnAGTGCCAGGTCTCAGGGTGGGGCA TCACCAACCACCCACGG 

AGTAAG<K3GGCCAG<KK:CAGGGGTCAGGGGTCAGGATGGGTACAAGTCTG 

GGATGCAGGGCGAGAGGTCGAATCATGACACCTCAGAGGAAGGATGGGTA 

AAGGGTCAGGGTGTGGGATGGGACATCAGGATCATGGTTTGGGGTCAGAG 

ATTATGGTGGATTGGGGTCTTGGGAGCCAAAGGGGTTAAAGGACTGGGTA 

TGAAGTCAGGGATCAGAGGTCAGAGGTCAGAGTGTGTCAGAGGTCATCAC 

ACTGGAGCAAAAGGCATATATATATATATATGTATGTATAGGATATGGGC 

ATTGTGG<}TCATGGGT€'TGG©©lTAGAGGTCACCGTA"G^^^AAGG3SGAT*i, 

GGGATCCAGAGGTTGTACAATCTGGTCAAAATCTGAGGATGGAAATTGGG !, 

ATTCTATCCAAAATCAGATATCTGAGATTGGAGGTCATA®eG-EKF©G©GT 

GTGG<3GGCCGAAGTTTGG<3GTCATG<3AGGe-TGGGG€€CAA^AT^A6jrAGGA .» 

TCAGGGGAGACi;GGGGTTGGA^GGAGTGAG<3=TTTGGAA©A^GGAGAGGTG 

AGGTTGGAGGTTAAGGTAAAGACAGGGACATGGGGTCAGGAGACAGAAGA 

TATGAGATCAAGCTGGGATCATAAGGTAATAAGACAGAAGGTCAAAGATC 
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ACAGTAGCTGGCATTGAAGAGGGTCAGGTCTGGATTCGTTGTCTCTGACG 

CTGGAGAGACAAGAAAGTTCTTGAGTTATGCCACTCAAAGTCAAATGTCA 

AAGATCAAAGAGACCGTCAATCATCTGGGGTCATGATTCATATGAAATTA 

AGTCATAAATATGTAACTTGGAGGTTTCGGGATTGTAGTACAGGTCGGTG 

AGGGGCAGGGGTATTGACATGGATGGGCCACATCCAGGGAAGAGGGACGT 

GGCCTCAAAGTGGGGAGATTTAGGGGACCCTGCAGCACGCATGTTCTCTC 

TrrA nArrrATTCCCGGATrT^rTrCAGT HrCTCAACCTCTCCATCGTCT 4806^939 

CCCATGCCArCTGCCATGGTGTGTATCr rGGGAGAATCACGAGCAACATG (3) 

GTGTGTGCAGGCGGCGTCCCGGGGCAGGATGCC TGCCAGGTGAGCCAGTG 
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